Alhadis / language-regexp

For those who're serious about writing regular expressions.
https://atom.io/packages/language-regexp
ISC License
6 stars 2 forks source link

Add actual text to the README #2

Closed waldyrious closed 4 years ago

waldyrious commented 4 years ago

Since this is the library being used for syntax highlighting of fenced code blocks in GitHub, it should be possible to reproduce the highlighting of the code in the screenshot in the README.

This is what the screenshot currently shows (barring transcription errors):

\k<n>     \k'n'                                   [[:^word]]   ASCII-range         Full-range        Backslash
\k<-n>    \k'-n'                                  ————————————————————————————————————————————————————————————
\k<name>  \k'name'                                alpha        \p{PosixAlpha}      \p{XPosixAlpha}
                                                  alnum        \p{PosixAlnum}      \p{XPosixAlnum}
\k<n+level> \k'n+level'                           ascii        \p{ASCII}
\k<n-level> \k'n-level'                           blank        \p{PosixBlank}      \p{XPosixBlank}   \h
                                                                                   \p{HorizSpace}
\A(?<a>|.|(?:(?<b>.)\o{22}qr\g<a>\k<b>))\z        cntrl        \p{PosixCntrl}      \p{XPosixCntrl}
\A(?<a>|.|(?:(?<b>.)\o{22}qr\g<a>\k<b+0>))\z      digit        \p{PosixDigit}      \p{XPosixDigit}   \d
                                                  graph        \p{PosixGraph}      \p{XPosixGraph}
\h, \H                                            lower        \p{PosixLower}      \p{XPosixLower}
(?<name>...), (?'name'...)                        print        \p{PosixPrint}      \p{XPosixPrint}
\k<name>                                          punct        \p{PosixPunct}      \p{XPosixPunct}
\g<name>, \g<group—num>                                        \p{PerlSpace}       \p{XPerlSpace}    \s
                                                  space        \p{PosixSpace}      \p{XPosixSpace}
\p{k}                                             upper        \p{PosixUpper}      \p{XPosixUpper}
\pP                                               word         \p{PosixWord}       \p{XPosixWord}    \w
\g{1}                                             xdigit       \p{PosixXDigit}     \p{XPosixXDigit}
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
                                                  (?R)         (?(2)yes|no)        ?                 (...)
\o{2,2}                                           (?n)         (?(+2)yes|no)       ?+                (?<name>...)
\o{22424}                                         (?+2)        (?(-2)yes|no)       ??                (?'name'...)
                                                  (?-2)        (?(<name>)yes|no)   *                 (?P<name>...)
\p{ASCII_Hex_Digit=True}                          (?&name)     (?('name')yes|no)   *+                (?:...)
\p{ASCII_Hex_Digit=False}                         (?P>name)    (?(name)yes|no)     *?                (?|...)
                                                  \g<name>     (?(R)yes|no)        +
(?—imx:subexp)                                    \g'name'     (??(R2)yes|no)      ++                (?i)
                                                  \g<2>        (?(R&name)yes|no)   +?                (?J)
(?<element> \g<stag> \g<content>* \g<etag> ){0}   \g'2'        (?(DEFINE)yes|no)   {1}               (?m)
(?<stag> < \g<name> \s* > ){0}                    \g<+2>       (?(assert)yes|no)   {1,3}             (?s)
(?<name> [a-zA-Z_:]+ ){0}                         \g'+2'       (?(?<=AA)yes|no)    {1,3}+            (?U)
(?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}   \g<-2>       (?(?=AA)yes|no)     {1,3}?            (?x)
(?<etag> </ \k<name+1> >){0}                      \g'-2'                           {1,}              (?-s)
\g<element>                                                                        {1,}+

Would you accept a PR adding this to the README, either alongside the screenshot, or replacing it?

waldyrious commented 4 years ago

This should also help identify some issues with the highlighting, e.g. the backslash escapes not being highlighted, or the discrepancy in how some constructs are highlighted:

(?:...) vs (?|...)
(?R)    vs (?n)

or the lack of highlighting of regex-significant punctuation like ?:()[]{}-, as compared to plain text like "alpha", "alnum", "blank", etc.

Alhadis commented 4 years ago

GitHub's syntax highlighter uses a very spartan colour palette, and many scopes either share identical colours or lack one altogether (such as punctuation characters). You can see the full-list of supported scopes (and their styling) here; bear in mind they're subject to change in future at the whim of GitHub's designers.

That being said, I'd prefer to keep the screenshot, as it better illustrates the grammar's capabilities. However, I'm not opposed to a PR to add a sample.regexp file with the text depicted in the screenshot.

This should also help identify some issues with the highlighting, e.g. the backslash escapes not being highlighted, or the discrepancy in how some constructs are highlighted:

Just an FYI, some of those tokens are being matched, but aren't being assigned any highlighting whatsoever. For example, \o is scoped to constant.character.escape.misc.regexp, which maps to the .pl-cce CSS class, which has no styling applied. IIRC, it used to be coloured a shade of dark purple, but I guess it was removed for being too close to the default text-colour…

waldyrious commented 4 years ago

Thanks for the clarifications, @Alhadis! I've submitted #3 with the text file as you suggested.

As for the lack of highlighting, I appreciate the details. Would it be possible to map some of the tokens to CSS classes that are highlighted? Or is that outside the scope/contol of this repo?

Also it wasn't clear to me whether the discrepancy in styling of similar constructs is also due to mere lack of styling, or if there's some consolidation that could be done in this grammar.

Alhadis commented 4 years ago

Would it be possible to map some of the tokens to CSS classes that are highlighted?

I could, but there's absolutely no guarantee that GitHub's highlighting won't change again in future. It's happened before and I bet it'll happen again…

waldyrious commented 4 years ago

Since PR #3 has been merged, I am ready to close this issue. Just for completeness, though, I'll wait for the response to the last question in my previous comment (about possible discrepancies in how similar regex constructs are tokenized).

To avoid relying on GitHub's styling, here's an edited version of the README screenshot with some of the apparent discrepancies:

language-regexp copy 3

(Note that I don't claim to be familiar with all those constructs; I assume that it's quite possible that the current coloring is correct and self-consistent, and simply confusing to the uninitiated due to visual, but not semantic, similarity.)

Alhadis commented 4 years ago

That's... not so easily explained. For the choice of scopes, I've adhered to the names recommended by TextMate, which are chosen based on semantics. Of course, it's a stretch to compare the lexical elements of a programming language with those of regular expressions; I recall having trouble making decisions on precisely what to scope. Should \o{777} be scoped the same as a normal escape sequence, or should it be scoped as though it were a function call? I wanted the highlighting to be as pronounced as possible, so I avoided scoping everything like an escape.

It's late here and I've no idea if this is making any sense or not.

waldyrious commented 4 years ago

What you're saying makes sense. I was just raising the possibility that there had been an overlook or two in terms of consistency. If all the apparent style discrepancies pointed out above are the result of explicit design decisions that you'd still stand by today, then I consider my concerns addressed.

In any case, the original motivation for this issue has been resolved by #3, so I'll go ahead and close it.