eclipse-langium / langium

Next-gen language engineering / DSL framework
https://langium.org/
MIT License
754 stars 68 forks source link

Improve type inferrence of data type rules #752

Closed pluralia closed 1 year ago

pluralia commented 2 years ago

The idea came from the discussion. Currently, ON returns string: 'on'; infers

export type ON = string;

Instead, we could infer it like

export type ON = 'on'
dhuebner commented 2 years ago

@pluralia Datatype rules can have more complex expression parts (optional, many, etc.) as well, how is it handled?

RULE returns string: 'opt'? 'many'* 'one'+;

A more advance example (even from the referenced issue) would be really nice

pluralia commented 2 years ago

@dhuebner, here is more examples.

MyEnum returns string: ON|OFF;
MyKeywordEnum returns string: ON | OFF | 'enum';
MyPatternEnum returns string: ON | OFF | 'opt'? 'many'* 'one'+;
ON returns string: 'on';  // define 'on', used at different location in my grammar
OFF returns string: 'off'; // define 'off', used at different location in my grammar

should generate

export type MyEnum = ON | OFF
export type MyKeywordEnum = ON | OFF | 'enum'
export type MyPatternEnum = string
export type ON = 'on'
export type OFF = 'off'
  1. If we can get a concrete word from the data type rule body, we do it (types OFF and ON)
  2. if we use complicated pattern, like in your example, we stuck with a string (type MyPatternEnum)
  3. If we use other datatype rules, we don't substitute their content and just use their names (types MyEnum, MyPatternEnum).
dhuebner commented 2 years ago

@pluralia I personally don't like when a generator behaves different if it finds a special pattern and generates things, that it thinks to be what I need. Also implementing more and more such kind of optimization may lead to a state where nobody can tell what will come out after changing something in the grammar. The case described in #720 is where the user expects an enum behavior of a datatype rule and I can imagine a lot of cases where it is a good idea, so what about introducing an enum datatype rule?

pluralia commented 2 years ago

The case described in https://github.com/langium/langium/discussions/720 is where the user expects an enum behavior of a datatype rule and I can imagine a lot of cases where it is a good idea, so what about introducing an enum datatype rule?

@dhuebner, yes, I like this idea. For example, this

MyEnum returns string: ON | OFF;
enum MyRealEnum: ON | OFF;
ON returns string: 'on' ('and' 'on')*;
OFF returns string: 'off'+;

will generate

export type MyRealEnum = ON | OFF
export type MyEnum = string

So, addition a keyword enum restricts to use only data type rules in a union, and we can infer type like a union of these data type rules without tries to evaluate a pattern.

msujew commented 2 years ago

Not a big fan of reintroducing enum rules to the grammar. We've removed them for a reason and introducing them again kind of defeats the purpose of generating specific types for data type rules. I'm not 100% against it, but I'd like to have clear semantics on this, which add actual value.

MyEnum returns string: ON | OFF;
enum MyRealEnum: ON | OFF;
ON returns string: 'on' ('and' 'on')*;
OFF returns string: 'off'+;
// generates
export type MyRealEnum = ON | OFF
export type MyEnum = string

Doesn't really add any value, as ON and OFF are still just string, so it's still effectively:

export type MyRealEnum = string
export type MyEnum = string

If you really want to introduce enum rules, it should look like this:

enum MyRealEnum: ON, OFF;
// generates
// Transforms snake_case into camelCase
enum MyRealEnum {
  on = 1,
  off = 2
}

Or with an additional, alternative syntax:


enum MyRealEnum: a=ON, b=OFF;
// generates
// Keeps the original names
enum MyRealEnum {
  a = 1,
  b= 2
}
spoenemann commented 2 years ago

Reason why I was for not supporting enums in the grammar: I usually avoid enums in TypeScript and use a union of string literal types instead.

pluralia commented 2 years ago

Not a big fan of reintroducing enum rules to the grammar.

Okay, adding enums is not what we want. However, I'd anyway expect that MyEnum returns string: 'on' would generate a type on for MyEnum and not just a string. It's like a enum with the only one element, and if for MyEnum returns string: 'on' | 'off' we generate a type 'on' | 'off', why not do the same for the only one type alternative?

pluralia commented 2 years ago

Also, currently we can't declare alternatives of datatype rules on the semantic model level as:

export type MyEnum = ON | OFF;

where ON and OFF where defined in the grammar as

ON returns string: 'on';    // 'on' can represent any pattern
OFF returns string: 'off';  // 'off' can represent any pattern

I can't write MyEnum: ON | OFF -- it requires a return type, but if I add a return type, export type MyEnum = string will be generated.

From my perspective, we should add enums that provide such functionality OR add this functionality in current implementation of data type rules.

pluralia commented 2 years ago

add this functionality in current implementation of data type rules.

For example, we allow don't specify a return type for MyEnum: ON | OFF-- for data a type rule that contain only alternatives of other data type rules. In this case export type MyEnum = ON | OFF; will be generated.

spoenemann commented 2 years ago

Yes, it would be great to improve the inference here. This is allowed, but the generated types are all string:

MyEnum returns string: ON | OFF;
ON returns string: 'on';
OFF returns string: 'off';

And it would also be great to enable the extraction of explicit types like this (currently the editor shows unhelpful error messages):

type MyEnum = ON | OFF;
type ON = 'on';
type OFF = 'off';

MyEnum returns MyEnum: ON | OFF;
ON returns ON: 'on';
OFF returns OFF: 'off';