eclipse-langium / langium

Next-gen language engineering / DSL framework
https://langium.org/
MIT License
754 stars 68 forks source link

Generated type from declared type does not include all of them #744

Closed luan-xiaokun closed 1 year ago

luan-xiaokun commented 2 years ago

A minimal example:

Using the following type declaration

grammar Debug

entry Model:
    (items+=AB)*;

interface A {
    a: string;
}
interface B {
    b: string[];
}

type AB = A | B;

A returns A:
    'A' a=ID;

AB returns AB:
    A ({B} b+=ID)*;

// A infers A:
//     'A' a=ID;

// AB:
//     A ({infer B} b+=ID)*;

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;

generates

export type AB = A;

export interface A extends AstNode {
    readonly $container: Model;
    a: string
}

export interface B extends AstNode {
    b: Array<string>
}

Switching to infers generates

export type AB = A | B;

'A' and 'B' are always subtypes of 'AB' in both ways.

luan-xiaokun commented 2 years ago

When using type declaration, changing the b in AB returns AB: A ({B} b+=ID)*; to other undeclared name such as c does not raise error.

luan-xiaokun commented 2 years ago

As far as I can tell, the skipping below seems to be the cause of this issue. After removing line 103 to 105 the generated type becomes export type AB = A | B; (but breaks my own language's generated ast, this is not a fix). Frankly, I do not really understand the logic behind the skipping.

https://github.com/langium/langium/blob/9f587815f9d14f433d22831ac11d3b9f11696d9f/packages/langium/src/grammar/type-system/inferred-types.ts#L97-L105

luan-xiaokun commented 2 years ago

After looking more closely at the code, I think the cause of the problem may be the distinct function on the stream.

https://github.com/langium/langium/blob/9f587815f9d14f433d22831ac11d3b9f11696d9f/packages/langium/src/grammar/type-system/type-collector.ts#L19-L34

In this minimal example:

  1. When infers is used, everything can be directly inferred by the collectInferredTypes function, so the variable declared now has empty array properties, and nothing goes wrong.
  2. When returns is used, the type AB obtained by collectInferredTypes has the property union containing only one element, namely A, because type B is a declared type and it is ignored. The union property of AB obtained by collectDeclaredTypes contains both A and B as we declared.
  3. The distinct function keeps the AB computed by collectInferredTypes, which is incomplete, and discards the other one, which is supposed to be kept.

I don't really know what are the functions collectInferredTypes and collectInferredTypes supposed to do and not supposed to do (information obtained by declaration may be more accurate than that of inference?), but exchanging their order in the array concatenation works for me and passes the vitest, i.e. Types in declared do not have container types, the following modification will lead to missing container types in generated ast.

const interfaces: InterfaceType[] = declared.interfaces.concat(inferred.interfaces);
const types: UnionType[] = declared.unions.concat(inferred.unions);
montymxb commented 2 years ago

Hi @luan-xiaokun . Yep, this is something that we're aware of as well here, I've come across it before in the past myself. The way we collect types produces an order-dependent result. It allows duplicate declarations of the same type as an inferred & declared instance, with the order here determining which takes precedence over the other. That's not correct, but it is something that should be rectified along with the rest of the broader model type inference changes that we're planning.

luan-xiaokun commented 2 years ago

Hey @montymxb, thanks for your reply. Glad to know that this issue is on the to-do list, and I'm looking forward to the improvement of type inference!