Raku / atom-language

Atom/Github Raku Syntax Highlighting Support 🦋
Other
24 stars 11 forks source link

Invalid regexes as detected by GitHub's grammar compiler #99

Closed lildude closed 1 year ago

lildude commented 1 year ago

Describe what you see, what you want to see, and perhaps some linkage to docs, synopses, or irclog chatter.

👋 I'm the lead maintainer of the https://github.com/github/linguist library which is used for language detection and providing the syntax highlighting for languages on GitHub.com, and we use this grammar.

Our grammar compiler has found a problem with your grammar which I thought I'd let you know about.

- [ ] repository `vendor/grammars/atom-language-perl6` (from https://github.com/perl6/atom-language-perl6) (6 errors)
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.cson`) contains a malformed regex (regex "`(?x) ( [\p{Digit}\p{Alpha}'\-_]+`...": unknown property name after \P or \p (at offset 16))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.cson`) contains a malformed regex (regex "`[\p{Digit}\p{Alpha}'\-_]+`": unknown property name after \P or \p (at offset 9))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.cson`) contains a malformed regex (regex "`(?x)(\$|@|%|&)(\.|\*|:|!|\^|~|`...": unknown property name after \P or \p (at offset 57))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.cson`) contains a malformed regex (regex "`(?x)(?<!\\)(\$|@|%|&)(?!\$)(`...": unknown property name after \P or \p (at offset 75))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.cson`) contains a malformed regex (regex "`(`": missing ) (at offset 1))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.cson`) contains a malformed regex (regex "`)`": unmatched parentheses (at offset 0))

GitHub uses PCRE instead of Ruby's Oniguruma regex engine for performance reasons which means some of your regexes aren't interpretted, in this case the \p character classes.

Example Code

N/A

Picture [optional]

Providing a picture means that if the issue is fixed and linguist updates to include the fix (linguist uses this package to highlight Raku on GitHub), your issue will remain historically viewable.

Leave this in. For internal use.

2colours commented 1 year ago

Hi @lildude ,

if I understand correctly, since this highlighter was designed with Oniguruma in mind, this is not really an issue with the content per se.

I'm fairly recent to take up on this topic so would you mind if I ask: how can one describe the same content with PCRE? If it's not possible, the change would have questionable value. Thanks.

lildude commented 1 year ago

https://www.regular-expressions.info/unicode.html#category is a great resource for this.

2colours commented 1 year ago

A follow-up:

Alpha = L(etter) or M(ark) Digit = Nd (decimal number)

Hopefully they would work with Oniguruma as well; honestly, I had to look up what a mark approximately is in the first place. 😅

raiph commented 1 year ago

Anyone reading this, ignore the specifics listed in the opening comment. Instead, click Outstanding Grammar Issues to view the latest Raku errors, if any. At the time of writing this comment the file with errors in it has changed its name to raku.tmLanguage.json and it looks like there are just two errors:

I've PR'd an edit of the file that I think (not tested!) does the minimal change of the things Github is saying are invalid.

I'm actually suspicious the \p{Alpha} was a bit "off" because a simple interpretation of what I read it was suggests it would have picked up a sequence that begins with a mark rather than only marks that follow an initial letter. But I'm not going to investigate that right now.

2colours commented 1 year ago

https://github.com/Raku/atom-language/pull/103 is out now; hopefully we get to see the result of it soon.

2colours commented 1 year ago

https://github.com/github-linguist/linguist/issues/3924#issuecomment-1676993935 seems like we are good now, as far as the syntax goes. Closing this issue.

raiph commented 1 year ago

Ugh. It looks like I misinterpreted the column count of some of the error messages (the column count was ambiguously pointed at the t of \p{Digit} or \ of \p{Alpha} and I thought, it seems incorrectly, it was the latter) and I also didn't understand your comment about Nd (which ironically is why I took a look in the first place, in an attempt to understand what the error message and your comment might have meant).

Anyway, thanks for fixing that and shepherding this to a successful conclusion.