dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Case where toHTML doesn't display well the content of the captures #239

Closed ghost closed 6 years ago

ghost commented 6 years ago

Run this

import pegged.grammar, pegged.peg, pegged.tohtml;

enum rules =
`   SB:
    Everything <- SheBangLine? Remaining
    SheBangLine <- "#!" (!endOfLine .)* endOfLine
    Remaining <- (!eoi .)*
`;

enum sample1 =
`#!excluded
included line 2
included line 3
  #!included line 4
`;
enum sample2 =
`
#!included b/c not first line
included line 2
included line 3
  #!included line 4
`;

mixin(grammar(rules));

void main()
{
    const ParseTree s1 = SB(sample1);
    const ParseTree s2 = SB(sample2);
    toHTML(s1, "s1.html");
    toHTML(s2, "s2.html");
}

and look at s2.html. The page doesn't display the captures but they are well in the HTML code.

veelo commented 6 years ago

Confirmed. How would you propose to handle this case? We could insert ⏎ for every empty line at the beginning of the match. Have a look and change row 45 in s2.html with

     <details><summary>SB [0, 83] <code>&#x23ce;<span><pre>&#x23ce;

What do you think?

Edit: Maybe only the first empty line.

ghost commented 6 years ago

Yes, this would avoid the confusion. However i'm not into HTML at all so i can't say if there's a better solution (i don't know what exactly the problem either).

veelo commented 6 years ago

OK. (The problem is that in an empty line there is nothing that the mouse pointer can hover over that would make the full match appear. So we need to show something that was not actually part of the match. A space would also work, but still be confusing.)

ghost commented 6 years ago

Let's add the symbol always. It's a very particular case where Pegged is used to do what a scanner would usually do. toHTML was just used to test a rule and whatever is done to fix this here we're on the display, this will never break anything. I mean that if adding the symbol always is ever proven to be a wrong decision, this still can be fixed without breaking anything...toHTML is only meant to check the grammar after all.

veelo commented 6 years ago

I don't follow you. Surely, you wouldn't want the first line of the match in s1.html to be displayed as "⏎#!excluded"?

ghost commented 6 years ago

Yeah, only "⏎" for empty lines. And documented in the wiki of course.

veelo commented 6 years ago

O, right. I'm not in favor of that, as I think it will clutter the output and make it optically distinct from how you would view the input in an editor. And to be consistent, we'd need to display all line breaks and possibly white space also. Maybe someone would want that, but I think it would be best enabled as an option (to show all non-printable content everywhere).

ghost commented 6 years ago

This case will be very rare. Let's start with your solution then.