Closed ArcturusZhang closed 3 months ago
The change in this PR is allowing us to do this:
namespace My.Service;
model Foo {
foo: EmbeddingVector<int32>;
}
in the generated code, but preventing us from doing this:
namespace My.Service;
model MyEmbeddingVector is EmbeddingVector<int32>;
model Foo {
foo: MyEmbeddingVector;
}
it seems we need to introduce template model/array/dict in tcgc to handle such situation.
Similar situation happens in MPG templates, if some one writes:
model VirtualMachine is TrackedResource<VirtualMachineProperties>
{}
which is the standard way to define a resource, but in the model we get from TCGC, there is nothing related with TrackedResource
as if it is not defined in this way.
I think this must be fixed
it seems we need to introduce template model/array/dict in tcgc to handle such situation.
well in my picture, we could just add a SourceType
in those types that applyable with is
, such as array
, model
and maybe dictionary
(dictionary is questionable because is
in this context means "additional properties" for models). In this way we could have:
interface SdkArrayType {
name: string;
namespace?: string;
elementType: SdkType;
sourceType?: SdkArrayType;
}
and for the language that would care the implementation of an array, they need to do this in their generator (taking csharp as example:
public CSharpType CreateCSharpType(InputType inputType) => inputType switch {
// omit other cases
InputListType listType => CreateListType(listType),
// omit other cases
};
public CSharpType CreateListType(InputListType listType)
{
// check name and namespace to see if there is a hit
var knownType = (listType.Name, listTypeNamespace) switch {
("EmbeddingVector", "Azure.Core") => new CSharpType(typeof(ReadOnlyMemory<>)).MakeGenericType(CreateType(listType.ValueType);
_ => null;
}
if (knownType != null) return knownType;
// if there is a chain of is, check the next
if (listType.SourceType != null) return CreateListType(listType.SourceType);
// return the final fallback, which is an ordinary list type
return new CSharpType(typeof(IList<>)).MakeGenericType(CreateType(listType.ValueType);
}
Similar things could happen to models by adding a SourceType
to show if this model has a is
. But maybe the decisions around additional properties does not have to change because is Record<T>
should always be the last, and we might want the sourceType
to have the same type as the enclosing type, considering the type system we are exposing from TCGC does not treat arrays and dictionaries as models.
Maybe in a more generic way, we could have something like this:
interface IsApplicableType<T extends SdkType> {
sourceType?: T
}
then SdkArrayType
, SdkModelType
(maybe others) could extends this new interface as well as the SdkTypeBase
(I believe interfaces could have multiple inheritance in typescript?), such as:
interface SdkArrayType extends SdkTypeBase, IsApplicableType<SdkArrayType> {
}
hhm so you should NOT be caring that something was done a model is
this is the point it copies the info, Embedding vector has some decorator that should carry over so you can do isEmbeddingVector
.
after offline sync, i believe we should use alias
instead of is
to prevent introducing a new type.
for mgmt, all resources are extended from Azure.ResourceManger.TrackedResource
, etc.
hhm so you should NOT be caring that something was done a
model is
this is the point it copies the info, Embedding vector has some decorator that should carry over so you can doisEmbeddingVector
.
Also FYI @timotheeguerin the decorator solution is not so good because of two reasons in this case:
Some explanations: I do not think it is a good idea that we end up with this pattern: if a new type is added in some core or template (like azure.core's embedding vector in this case), a TCGC update must be involved to support it. We are trying to add name and namespace to the array type - this is extensible from the SDK generator's perspective because this does not need TCGC to do anything when a new type is added, the downstream generator just gets the information. If the SDK generator is properly updated to handle that specific name and namespace's array (embedding vector), it handles it to a special type, otherwise it just sees this as an ordinary array and everything should work just as before.
I think this is important because we could imagine that in the future, 3rd party might write them own template (let us say openai writes their own openai.core template which introduces new scalars new array and new models). It is so bad if a TCGC or emitter update must be done to support that because in our current MGC's design (MGC stands for Microsoft.Generator.CSharp), we require them to have a MGC plugin to handle their own special types, it will be very cumbersome that they have to do the emitter side work by themselves. For types, I think name and namespace should solve it, but for decorators I do not have good solutions, maybe this is but I am not sure.
It is private by design because you should not use it. My point is that if you do model is WmbeeddingVector it copies the decorator and calling the isEmbeddingVector returns true
For the general case of known type we have a more general design issue we need to resolve but it probably would be in the same pattern using some decorator provided by the compiler
Taking an example: There is a definition in azure.core templates:
and I could use it in my own typespec:
and in the generator, we could get the correct name and namespace for the type of property foo, we know it is an array, and we know its name and namespace which differentiates this type from a raw array. This is great.
but if we do this in our own typespec, everything falls apart:
now the type of property foo is an array with a name of
MyEmbeddingVector
and a namespace ofMy.Service
. The middle layerAzure.Core.EmbeddingVector<int32>
just disappears.This has the same outcome as if we are writing:
It should be well-known that the contents of the above 2 typespec files are different.
We need a solution for this.