Open stutrek opened 4 hours ago
We extended what constitutes as whitespace in https://github.com/eclipse-langium/langium/pull/1589. It's likely that this leads to an automatic reordering of ID
to the "top" of the lexer, which then shadows the Title
rule. This is done as a performance optimization, as whitespace-related tokens should be attempted first by the lexer.
I'd be inclined to say that the way you're defining IdBoundary
is a bit of an anti-pattern. While it's something that we can likely improve on in the lexer, it matches a bunch of stuff you probably don't want to be matched there. Something like the zero-width space (which is why the token builder identifies this as a "whitespace token").
Note that you can create a workaround by overriding this part of the token builder:
I see, that makes a lot of sense. Is the antipattern listing characters to ignore rather than characters to match? This particular situation is challenging because this rule is used where people want spaces and punctuation. Additionally, many of our users use non-Roman alphabets. If you have any suggestions, I would like to improve it.
For now, I added some optional whitespace to the title rule so it also gets prepended, but it doesn't fix my out of memory issue :(. I'll make a minimal case for that.
Update: I made it so our IDs have to start with a non-whitespace character using your list of whitespace chars and it compiles as expected.
Unreachable rule, flipping the order of the rules does nothing. Worked in 3.1, breaks in 3.2.0
Langium version: 3.2.0 Package name: langium
Steps To Reproduce
The current behavior
The expected behavior
It compiles