eclipse-langium / langium

Next-gen language engineering / DSL framework
https://langium.org/
MIT License
754 stars 68 forks source link

Syntax highlighting broken for more complicated string types (string with escaping) #797

Closed goto40 closed 1 year ago

goto40 commented 2 years ago

Langium version: 0.5.0 Package name: langium-cli

Steps To Reproduce

  1. create hello-world example (yo langium)
  2. change grammar to
    
    grammar HelloWorld

entry Model: (persons+=Person | greetings+=Greeting)*;

Person: 'person' name=STRING;

Greeting: 'Hello' person=[Person:STRING] '!';

hidden terminal WS: /\s+/; terminal ID: /[a-zA-Z][\w]/; terminal INT returns number: /[0-9]+/; terminal STRING: /"(\"|[^"])"|'(\'|[^'])*'/;

hidden terminal ML_COMMENT: /\/*[\s\S]?*\//; hidden terminal SL_COMMENT: /\/\/[^\n\r]/;

3. Start vscode extension and enter a model

person "Pierre with \" inside" Hello "Pierre with \" inside"!



<!--
  Your bug will get fixed much faster if we can run your code and it doesn't
  have dependencies other than Langium. Issues without reproduction steps or
  code examples may be immediately closed as not actionable.
-->

Link to code example: (see above)
The `STRING` definition is the crucial point...

<!--
  Please provide a link to a repository on GitHub or provide a minimal code 
  example that reproduces the problem. You may provide a screenshot of some 
  application if you think it is relevant to your bug report. Here are some 
  tips for providing a minimal example: https://stackoverflow.com/help/mcve.
-->

## The current behavior

Scoping resolution works as expected (misspell the name and you get an error as expected), but Syntax highlighting is broken (after the first string everything formatted as string).

## The expected behavior

No broken highlighting.
luan-xiaokun commented 2 years ago

It seems that this is because the textmate file (the json file in the syntaxes folder) is not correctly generated. The terminal rule STRING generates the following

    {
      "name": "string.quoted.double.langium-debug",
      "begin": "\"",
      "end": "\\\\\"[^\"]\""
    },
    {
      "name": "string.quoted.single.langium-debug",
      "begin": "'",
      "end": "\\\\'[^']'"
    }

Basically, it means that it will take anything between a starting " and an ending \"x" as a string and highlight it as a string, where x is anything that is not ". Clearly there is a missing alternative operator | in the generated textmate file.

msujew commented 2 years ago

Note that the generated syntax highlighting file is just a helper to get you started with writing/extending the textMate grammar. The heuristics used in the generator to figure out the highlighting rules are fairly simple and are not intended to be complete. We accept improvements to the code though. So if you can figure out what's wrong there, any contributions are welcome.

luan-xiaokun commented 2 years ago

String and comment are using the same regex visitor. Since the textmate generator is doing a good job for multiline comments, we can modify the string regex to make it look like a comment regex, such as "(\\"|[^"])*?" (note the non-greedy option), this can fix the issue, though I haven't figured out how the visitor does it. (Oops, not working actually)

msujew commented 1 year ago

This has been fixed with https://github.com/langium/langium/pull/888.