BenjaminSchaaf / sbnf

A BNF-style language for writing sublime-syntax files
MIT License
58 stars 6 forks source link

Syntax gets into bad state after branching #49

Open digitalcora opened 6 months ago

digitalcora commented 6 months ago

I was experiencing some strange issues trying to build a syntax, and narrowed it down to this example grammar. The only thing it recognizes are "import statements", which are the keyword import, then a single "item" which can be either aliased (x as y) or unaliased (x), then a semicolon. In either case we want to scope the "usable" name of the entity with entity.name (the name it would be usable as... if this language actually had a way to use the things you import).

IDENT = '[a-z]+'
main : (`import`{keyword} maybe-aliased-item `;`{punctuation.terminator})* ;
maybe-aliased-item : IDENT{entity.other} `as`{keyword} IDENT{entity.name} | IDENT{entity.name} ;

The generated syntax has a weird property: You can write as many "unaliased" imports as you like, and they are all scoped correctly. But when you write an "aliased" import, although that import is scoped correctly (including the semicolon), it leaves some extra contexts on the stack, which breaks the syntax for a bit until it encounters enough "invalid" characters to pop back to main.

   import a as b; import c; import d;
// ^^^^^^ keyword
//        ^ entity.other
//          ^^ keyword
//             ^ entity.name
//              ^ punctuation.terminator
//                ^^^^^^ entity.name (?!)
//                       ^^ invalid.illegal
//                          ^^^^^^ keyword

In this example it would also be possible to match the whole "item" using regexes and thus avoid branching, but in my real syntax the alias may be found on a different line (and separated by many other tokens) from the original identifier.