BenjaminSchaaf / sbnf

A BNF-style language for writing sublime-syntax files
MIT License
58 stars 6 forks source link

Regex captures example doesn't work #48

Open digitalcora opened 6 months ago

digitalcora commented 6 months ago

The tutorial includes this section on regex captures:

main : '([a-zA-Z]+)(?:(\.)([a-zA-Z]+))*'
       {meta.path, 1: entity.name, 2: punctuation, 3: entity.name}
     ;

This will have meta.path assigned to the whole regex, entity.name assigned to each of the [a-zA-Z]+ and punctuation assigned to \..

But this doesn't actually do what it says: when tested against a file consisting of one.two.three.four, only one and four are given the entity.name scope, and only the final dot is given the punctuation scope.

From searching around it seems like Sublime's regex engine is probably implemented this way on purpose for performance reasons, but then this makes me wonder how to do what the tutorial suggests for real. If you do it this way:

IDENT = '[a-zA-Z]+'
main : (IDENT{entity.name} `.`{punctuation})* IDENT{entity.name} ;

...then whitespace is allowed between the punctuation and entity names. Although you could disallow whitespace if you were writing the sublime-syntax by hand, I can't find a way to do this within SBNF. Is it possible?

BenjaminSchaaf commented 6 months ago

You're right this is a limitation of regexes in ST. You could unroll the loop manually up to a certain depth:

([a-zA-Z]+)(?:(?:(?:(\.)([a-zA-Z]+))?(\.)([a-zA-Z]+))?(\.)([a-zA-Z]+))?

Currently there's no way to make SBNF not allow white space.

digitalcora commented 6 months ago

Good to know, thanks. I think this would be a good idea for an improvement! I'm currently translating a tree-sitter grammar into sublime-syntax, and it similarly allows whitespace between tokens, but includes a token.immediate function to indicate there should not be any whitespace between the previous token and this one. Maybe, like the "passive" prefix ~<expr>, there could be an "immediate" prefix !<expr>? (symbol chosen arbitrarily)