DataType 'returns' is Potentially Misleading

montymxb commented 2 years ago

While discussing some details yesterday over another project, it came up that the form of writing a DataType rule could mislead new users into thinking it returns a primitive type, rather than the DataType itself. I believe this goes against the notion of having a low barrier to entry for newcomers, as it could confuse and frustrate them.

For example, the following produces a value of the type SomeDataType, but is in itself a subtype of string. To make this work, we write 'returns' like so.

SomeDataType returns string: ID

For somebody unfamiliar with Langium, they may come to believe that SomeDataType actually returns string, given the literal wording. This is further reinforced by our consistent usage of 'returns' elsewhere for parser rules & terminals, where this is always the case.

My suggestion is that we at least consider whether we should:

use a different keyword than returns to signify a subtyping relationship (although internally we just use a type alias).
allow dropping returns type, and being able to infer the super type from utilized keywords, terminal rules, and data type rules
utilize highlighting to distinguish between DataType and ParserRules (this is present in Xtext)

luan-xiaokun commented 2 years ago

Couldn't agree with you more, as a new user, it took me an hour to figure this out. But I still don't quite get the difference between types and rules.

montymxb commented 2 years ago

@luan-xiaokun thanks for chiming in! It's of particular interest to make sure that new users are able to pick up Langium quite easily 🙂 . We will also be updating our documentation to hopefully make this clearer going forward as well.

If you haven't read it yet, we do have an entry on Semantic Model Inference that goes more into how Types can declared, or inferred from rules. That may help clear up the difference between the two.

spoenemann commented 2 years ago

@montymxb I don't get the problem. Isn't the type of ID in your example above exactly string?

montymxb commented 2 years ago

@spoenemann the type of ID is string in this case. The problem is we useSomeDataType as a new type, and it's not the same as string (even though internally this is a type alias to string). This effectively creates a subtype, but the wording here seems to suggests it's just a string, which does not reflect this relationship.

Currently, the inferred type system will resolve DataTypes as they are, not the type that they're based on. So although it is technically a string, it has it's own distinct subtype.

It's also confounded by the fact the keyword returns has a different meaning on where we use it (Parser rules vs. DataType rules). It makes sense for the former case, because we're returning a value of that type, but in the 2nd case it's not the same.

spoenemann commented 2 years ago

Ok so you mean something like this?

// Declared
type SomeDataType = 'a' | 'b' | 'c'
SomeDataType returns SomeDataType: 'a' | 'b' | 'c';

// Inferred
SomeDataType returns string: 'a' | 'b' | 'c';

The declared variant looks good to me. The inferred variant is indeed a bit different from the other usages of returns. But it's not wrong: the value returned by the data type rule is a string. The inferrer only creates a subtype of string when the value is an alternative of string literals. This does not apply to this declaration, for example:

SomeDataType returns string: 'a' | 'b' | ID;

Of course the relationship between grammar declaration and generated TypeScript declarations is more hidden if you use inference. But that's why we introduced type declarations in the grammar. Those who prefer things to be explicit can simply choose the variant at the beginning of my post.

luan-xiaokun commented 2 years ago

@montymxb Thank you for your advice, the document is very helpful to me, I'm using langium to implement a vscode extension for a modeling language of our own design

montymxb commented 2 years ago

Following up, we had some internal discussions last week about this. We haven't 100% decided this, but we're leaning towards keeping the returns syntax for DataTypes, but making sure the DataType correctly returns the declared type (removing the notion of a sub-type here). This would remove the ambiguity reported here about what is actually returned, and keeps some other parts of Langium working nicely as well.

spoenemann commented 1 year ago

We release the current syntax as v1.0; we should keep it unless we find good reasons to change it.

eclipse-langium / langium

DataType 'returns' is Potentially Misleading #605