eclipse-langium / langium

Next-gen language engineering / DSL framework
https://langium.org/
MIT License
745 stars 67 forks source link

Maximum call stack size exceeded when using many optional groups #775

Closed dhuebner closed 6 months ago

dhuebner commented 1 year ago

Langium version: current master

When using many optional groups collectInferredTypes (inferred-types.ts Line: 184) creates many different types. Currently it seems like 2ⁿ where n is number of optional groups.

With the node v14 and 17 optional groups a "RangeError" error is thrown. Below is a example grammar for testing:

grammar Test

entry Model:
    (title1=INT ';')?
    (title2=INT ';')?
    (title3=INT ';')?
    (title4=INT ';')?
    (title5=INT ';')?
    (title6=INT ';')?
    (title7=INT ';')?
    (title8=INT ';')?
    (title9=INT ';')?
    (title10=INT ';')?
    (title11=INT ';')?
    (title12=INT ';')?
    (title13=INT ';')?
    (title14=INT ';')?
    (title15=INT ';')?
    (title16=INT ';')?
    (title17=INT ';')?
    /* 
    (title18=INT ';')?
    (title19=INT ';')?
    (title20=INT ';')?
    (title21=INT ';')?
    (title22=INT ';')?
    (title23=INT ';')?
    (title24=INT ';')?
    (title25=INT ';')?
    (title26=INT ';')?
    */
;
terminal INT returns number: ('0'..'9')+;
tavoda commented 6 months ago

Hello, is somebody working on this? Without this you can't use langium for any serious grammar.

msujew commented 6 months ago

@tavoda I've provided a fix. We'll probably soon release a version 3.0.1 that contains this fix.

Without this you can't use langium for any serious grammar.

Note that I would strongly disagree with this. We're working quite intensively with Langium and designed some pretty "serious grammars", and so far the only time we've encountered this issue was when we migrated a legacy grammar from Xtext to Langium, which we needed to rewrite anyway.

tavoda commented 6 months ago

@msujew thanks for fix. How I can rewrite such rules? This is of course old XText project which I would like to port but I don't see other options. Please check e.g. DslAttribute: https://github.com/sculptor/sculptor/blob/f57ce67563f1ffa0a6a52d1ea81ad14dbe31b8e7/sculptor-eclipse/org.sculptor.dsl/src/org/sculptor/dsl/Sculptordsl.xtext#L336 You don't support reordered optional parameters, I can live without this but how to migrate this grammar?

msujew commented 6 months ago

@tavoda I'd rewrite all these <key> = <value> mappings using a new rule:

DslAttribute :
  (doc=STRING)?
  (visibility=DslVisibility)? (collectionType=DslCollectionType"<")? type=DslType (">")? name=ID
    (properties+=DslAttributeProperty)*
  (";")?;

DslAttributeProperty: NOT ref=[PropertyDefinition:ID] | ref=[PropertyDefinition:ID] ('=' value=Literal)?;
...
PropertyDefinition: name=ID ':' type=Type;

These PropertyDefinition elements are part of a standard library delivered with your language. Everything else is handled via type based validation rules and cross references. That how we design most languages nowadays which contain a lot of structural data.

tavoda commented 6 months ago

It's just moving syntactical rules to semantic layer. For me BNF rules are exactly for this purpose to express as much as possible in syntax. We can rewrite nearly whole grammar (and all C like grammars) to: CLikeGramars: NOT ref=[PropertyDefinition:ID] | ref=[PropertyDefinition:ID] ('=' value=Literal)? | Punct | String; Punct: /\b[!#$%&()*+,-./:;<=>?@[\]^_{\|}~]\b/ That is not purpose of parser because than you degrade it to tokenizer. Thanks again for fix, looking forward to test it.

msujew commented 6 months ago

For me BNF rules are exactly for this purpose to express as much as possible in syntax.

IMO there's a big difference in BNF purely for parsing (i.e. something like a compiler) compared to editing (e.g. Xtext or Langium). In our experience (i.e. at TypeFox), the parser for an editor should allow a lot of "invalid" input, which is then validated against a set of known rules. This serves two purposes:

  1. You usually get way better error recovery/messages from the parser when being a bit more flexible with your language.
  2. Error messages can be way better compared to auto-generated syntax errors (both in regards to incorrectly typed property names as well as the value assigned to it).

With that in mind, I think my suggestion serves that grammar better than just embedding all that information into the grammar directly. No offense here, that's exactly what I did when I started out with Xtext. But in my experience, making things more flexible usually leads to a better product/end result.

That is not purpose of parser because then you degrade it to tokenizer.

It's honestly pretty difficult to tell what the purpose of a parser is. To some, it's just validating some string input. Others want a visitor like pattern, that allows them to tell which rule/part of a rule has been called. This is usually the academic perspective. Some even want a full AST. We serve the last requirement together with all the editing and linking services.

tavoda commented 6 months ago

Yes, I'm more on AST side ;-). After you explanation I see. For editor is better to have your approach with validation on semantic layer.

snarkipus commented 6 months ago

IMO there's a big difference in BNF purely for parsing (i.e. something like a compiler) compared to editing (e.g. Xtext or Langium). In our experience (i.e. at TypeFox), the parser for an editor should allow a lot of "invalid" input, which is then validated against a set of known rules. This serves two purposes:

  1. You usually get way better error recovery/messages from the parser when being a bit more flexible with your language.
  2. Error messages can be way better compared to auto-generated syntax errors (both in regards to incorrectly typed property names as well as the value assigned to it).

This is such a great take. While immediately obvious to some, this realization took me a while to grasp, but it fundamentally changed how I think about tooling in general.