Azure / typespec-azure

About TypeSpec Azure Libraries
https://azure.github.io/typespec-azure/
MIT License
11 stars 37 forks source link

TCGC should be able to show the chain of `is` instead of swallowing it #1005

Closed ArcturusZhang closed 3 months ago

ArcturusZhang commented 3 months ago

Taking an example: There is a definition in azure.core templates:

namespace Azure.Core;
model EmbeddingVector<T> is Array<T>;

and I could use it in my own typespec:

namespace My.Service;

model Foo {
    foo: EmbeddingVector<int32>;
}

and in the generator, we could get the correct name and namespace for the type of property foo, we know it is an array, and we know its name and namespace which differentiates this type from a raw array. This is great.

but if we do this in our own typespec, everything falls apart:

namespace My.Service;

model MyEmbeddingVector is EmbeddingVector<int32>;

model Foo {
    foo: MyEmbeddingVector;
}

now the type of property foo is an array with a name of MyEmbeddingVector and a namespace of My.Service. The middle layer Azure.Core.EmbeddingVector<int32> just disappears.

This has the same outcome as if we are writing:

namespace My.Service;

model MyEmbeddingVector is Array<int32>;

model Foo {
    foo: MyEmbeddingVector;
}

It should be well-known that the contents of the above 2 typespec files are different.

We need a solution for this.

ArcturusZhang commented 3 months ago

The change in this PR is allowing us to do this:

namespace My.Service;

model Foo {
    foo: EmbeddingVector<int32>;
}

in the generated code, but preventing us from doing this:

namespace My.Service;

model MyEmbeddingVector is EmbeddingVector<int32>;

model Foo {
    foo: MyEmbeddingVector;
}
tadelesh commented 3 months ago

it seems we need to introduce template model/array/dict in tcgc to handle such situation.

ArcturusZhang commented 3 months ago

Similar situation happens in MPG templates, if some one writes:

model VirtualMachine is TrackedResource<VirtualMachineProperties>
{}

which is the standard way to define a resource, but in the model we get from TCGC, there is nothing related with TrackedResource as if it is not defined in this way. I think this must be fixed

ArcturusZhang commented 3 months ago

it seems we need to introduce template model/array/dict in tcgc to handle such situation.

well in my picture, we could just add a SourceType in those types that applyable with is, such as array, model and maybe dictionary (dictionary is questionable because is in this context means "additional properties" for models). In this way we could have:

interface SdkArrayType {
    name: string;
    namespace?: string;
    elementType: SdkType;
    sourceType?: SdkArrayType;
}

and for the language that would care the implementation of an array, they need to do this in their generator (taking csharp as example:

public CSharpType CreateCSharpType(InputType inputType) => inputType switch {
    // omit other cases
    InputListType listType => CreateListType(listType),
    // omit other cases
};

public CSharpType CreateListType(InputListType listType)
{
    // check name and namespace to see if there is a hit
    var knownType = (listType.Name, listTypeNamespace) switch {
        ("EmbeddingVector", "Azure.Core") => new CSharpType(typeof(ReadOnlyMemory<>)).MakeGenericType(CreateType(listType.ValueType);
        _ => null;
    }

    if (knownType != null) return knownType;

    // if there is a chain of is, check the next
    if (listType.SourceType != null) return CreateListType(listType.SourceType);

    // return the final fallback, which is an ordinary list type
    return new CSharpType(typeof(IList<>)).MakeGenericType(CreateType(listType.ValueType);
}

Similar things could happen to models by adding a SourceType to show if this model has a is. But maybe the decisions around additional properties does not have to change because is Record<T> should always be the last, and we might want the sourceType to have the same type as the enclosing type, considering the type system we are exposing from TCGC does not treat arrays and dictionaries as models.

Maybe in a more generic way, we could have something like this:

interface IsApplicableType<T extends SdkType> {
    sourceType?: T
}

then SdkArrayType, SdkModelType (maybe others) could extends this new interface as well as the SdkTypeBase (I believe interfaces could have multiple inheritance in typescript?), such as:

interface SdkArrayType extends SdkTypeBase, IsApplicableType<SdkArrayType> {
}
timotheeguerin commented 3 months ago

hhm so you should NOT be caring that something was done a model is this is the point it copies the info, Embedding vector has some decorator that should carry over so you can do isEmbeddingVector.

tadelesh commented 3 months ago

after offline sync, i believe we should use alias instead of is to prevent introducing a new type. for mgmt, all resources are extended from Azure.ResourceManger.TrackedResource, etc.

ArcturusZhang commented 3 months ago

hhm so you should NOT be caring that something was done a model is this is the point it copies the info, Embedding vector has some decorator that should carry over so you can do isEmbeddingVector.

Also FYI @timotheeguerin the decorator solution is not so good because of two reasons in this case:

  1. the decorator in this case is defined as private.
  2. I do not think the decorators are good for extensibility speaking from the SDK generator's perspective.

Some explanations: I do not think it is a good idea that we end up with this pattern: if a new type is added in some core or template (like azure.core's embedding vector in this case), a TCGC update must be involved to support it. We are trying to add name and namespace to the array type - this is extensible from the SDK generator's perspective because this does not need TCGC to do anything when a new type is added, the downstream generator just gets the information. If the SDK generator is properly updated to handle that specific name and namespace's array (embedding vector), it handles it to a special type, otherwise it just sees this as an ordinary array and everything should work just as before.

I think this is important because we could imagine that in the future, 3rd party might write them own template (let us say openai writes their own openai.core template which introduces new scalars new array and new models). It is so bad if a TCGC or emitter update must be done to support that because in our current MGC's design (MGC stands for Microsoft.Generator.CSharp), we require them to have a MGC plugin to handle their own special types, it will be very cumbersome that they have to do the emitter side work by themselves. For types, I think name and namespace should solve it, but for decorators I do not have good solutions, maybe this is but I am not sure.

timotheeguerin commented 3 months ago

It is private by design because you should not use it. My point is that if you do model is WmbeeddingVector it copies the decorator and calling the isEmbeddingVector returns true

timotheeguerin commented 3 months ago

For the general case of known type we have a more general design issue we need to resolve but it probably would be in the same pattern using some decorator provided by the compiler