Closed eiriktsarpalis closed 5 months ago
Is the label-tagger bot broken?
It was having issues a few weeks back.
How does this interact with discriminated unions (e.g. [JsonPolymorphic(...)]
? I'm not very familiar with the intricacies of STJ contracts and JSON schema vs. OpenAPI, but I believe there are some difficulties lying around here. My primary concern is that we will end up with broken OpenAPI schemas due to the inability to properly express discriminated unions (see https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/).
@benlongo after a bit of experimentation I ended up using anyOf
in the prototype.
Hi @eiriktsarpalis, thanks for looking into this! As a preface, I'm not very familiar with the intricacies of JSON Schema, so take anything I say with a grain of salt :)
Regarding the in-progress OpenAPI work (which I've left a related comment on https://github.com/dotnet/aspnetcore/issues/54598#issuecomment-2080215043), I'm concerned that the difference between JSON Schema and OpenAPI will cause paper cuts around discriminated unions; if the OpenAPI implementation is to naively delegate schema generation for discriminated unions, then things won't work properly. We use a lot of discriminated unions in our data model so I'm very invested in it working properly.
I'll use the example objects (modified slightly) from the OpenAPI 3.1.0 spec (https://swagger.io/specification/#discriminator-object). This translates to the following STJ model.
[ JsonPolymorphic( TypeDiscriminatorPropertyName = "petType" ) ]
[ JsonDerivedType( typeof(Cat), Cat.PetType ) ]
[ JsonDerivedType( typeof(Dog), Dog.PetType ) ]
[ JsonDerivedType( typeof(Lizard), Lizard.PetType ) ]
public abstract record Animal;
public record Cat : Animal {
public const string PetType = "cat";
public required string Name { get; init; }
}
public record Dog : Animal {
public const string PetType = "dog";
public required string Bark { get; init; }
}
public record Lizard : Animal {
public const string PetType = "lizard";
public required bool LovesRocks { get; init; }
}
In JSON Schema world, I would expect this to get mapped to something very similar to what you have in your prototype: an anyOf
or oneOf
with constant string discriminators.
In OpenAPI world however, discriminated unions are handled differently. I would expect the following OpenAPI schema to be generated for Animal
.
Animal:
oneOf:
- $ref: '#/components/schemas/Cat'
- $ref: '#/components/schemas/Dog'
- $ref: '#/components/schemas/Lizard'
discriminator:
propertyName: petType
mapping:
cat: '#/components/schemas/Cat'
dog: '#/components/schemas/Dog'
lizard: '#/components/schemas/Lizard'
I don't think that the STJ JSON Schema library should be aware of OpenAPI peculiarities, but I definitely think the proper escape hatches need to exist so that the OpenAPI implementation can generate the correct schema. I have no idea what those escape hatches look like, or if they already exist, but I can imagine how a simple implementation of OpenAPI would result in this being difficult or impossible to express. The OpenAPI implementation will have to be aware of the underlying contract somehow to bypass JSON Schema generation for certain cases like this.
As an aside, based on https://json-schema.org/understanding-json-schema/reference/combining it seems like it could make more sense to use oneOf
instead of anyOf
(at least when discriminators are involved). However, I guess if every value has a discriminator then the consumer of a payload is guaranteed that anyOf
implies oneOf
. I'm not sure what the material difference of this is in the real world, but I could see anyOf
making sense due to the highlighted performance consideration from the linked docs:
Careful consideration should be taken when using oneOf entries as the nature of it requires verification of every sub-schema which can lead to increased processing times. Prefer anyOf where possible.
One place I could anyOf
going wrong in the real world is a TypeScript generator not realizing it can create a union type for all the variants of an anyOf
. With oneOf
, a TypeScript generator would not have to do any hard work to know that translating to a union is valid.
I don't think that the STJ JSON Schema library should be aware of OpenAPI peculiarities, but I definitely think the proper escape hatches need to exist so that the OpenAPI implementation can generate the following schema.
I agree with that sentiment, it's something we've been looking at solving with @captainsafia. The prototype uses a callback API that lets users append or modify JSON schema documents based on presence of particular properties, although this particular use case makes things trickier.
As an aside, based on https://json-schema.org/understanding-json-schema/reference/combining it seems like it could make more sense to use oneOf instead of anyOf
The problem with oneOf
is that you could have two separate derived types whose schema matches a given JSON document (this is possible because type disriminators are optional in STJ).
I'm concerned that the difference between JSON Schema and OpenAPI...
For some additional context, OpenAPI 3.1 is built on JSON Schema 2020-12 by default. Even previous versions of OpenAPI use a modified JSON Schema draft 4.
Discriminated unions aren't a problem that JSON Schema has. They're a problem that C# has.
The discriminator
keyword is an OpenAPI addition. JSON Schema evaluation will return the content of the keyword as an annotation, where OpenAPI will continue processing. The oneOf
/anyOf
performs the actual validation to ensure that the data is expected; OpenAPI uses discriminator
combined with the evaluation results to determine which subschema was valid.
The problem with
oneOf
is that you could have two separate derived types whose schema matches a given JSON document (this is possible because type disriminators are optional in STJ).
For the fully general case, anyOf
definitely makes sense. Perhaps oneOf
should be reserved for cases where all variants have a defined discriminator?
I'm also not sure how one would define numeric discriminators in the OpenAPI mapping
property.
For some additional context, OpenAPI 3.1 is built on JSON Schema 2020-12 by default. Even previous versions of OpenAPI use a modified JSON Schema draft 4.
Thanks for this context there. I took a quick skim through draft-05
through 2020-12
and didn't notice anything that should impact schema composition, but I also don't know how exactly the draft-04 version was modified in earlier versions of OpenAPI.
Discriminated unions aren't a problem that JSON Schema has. They're a problem that C# has.
Just so I understand what you're getting at here, my understanding is that serializing JSON discriminated unions used to be a problem for C# (particularly STJ), but is no longer an issue with [JsonPolymorphic]
. I agree they are definitely not an issue for JSON Schema to express, but it is problematic that OpenAPI has this distinct method for encoding them despite using JSON Schema already being fully capable.
The
discriminator
keyword is an OpenAPI addition. JSON Schema evaluation will return the content of the keyword as an annotation, where OpenAPI will continue processing. TheoneOf
/anyOf
performs the actual validation to ensure that the data is expected; OpenAPI usesdiscriminator
combined with the evaluation results to determine which subschema was valid.
Thanks for explaining the annotation behavior - I was not aware of that. If I'm understanding this correctly, the OpenAPI additions (discriminator
, etc.) are valid additions according to the JSON Schema spec as they are annotations. In this case, the escape hatches required to make discriminated unions work in OpenAPI world and JSON Schema world correctly may not have to be as extreme as I was imagining.
It sounds like you are describing one possible use of the OpenAPI document at runtime where there is a module validating based on JSON Schema, and then another module adding OpenAPI information onto these results (correct me if I'm wrong there). The only use case I have experience with is client code generation from the OpenAPI document. In this scenario, the JSON Schema may not be used for validation (directly at least), but rather for typing/parsing information. I don't believe that exact flow will always be taken, but the annotations being structurally valid JSON Schema is definitely relevant.
It seems like the addition of discriminator
annotations for OpenAPI could potentially be done directly through the JSON Schema library as a post processing step. For example, it could pattern match on oneOf
unions where all variants share a common property with constant value, but that seems like it could be brittle and expensive to compute. A cleaner alternative may be to hold onto the STJ contract alongside the generated JSON Schema as context so the OpenAPI generator can reach behind the curtain to figure out if it has a polymorphic type on its hands. Ultimately I think it comes down to whether the JSON Schema implementation wants to expose STJ contract or not.
The prototype uses a callback API that lets users append or modify JSON schema documents based on presence of particular properties.
Do these callbacks expose strictly JSON Schema information, or can STJ contract data be accessed through this interface?
Another possibly relevant issue (not really a bug per se) that I've run into with STJ is that sub-types of a [JsonPolymorphic]
discriminated union do not have their discriminator serialized. If you have an endpoint that returns a specific sub-type in addition to the endpoint that returns the full union, you may run into issues if you expect that discriminator to be there.
I took a quick skim through draft-05 through 2020-12 and didn't notice anything that should impact schema composition, but I also don't know how exactly the draft-04 version was modified in earlier versions of OpenAPI.
Draft 5 is basically the OpenAPI-specific draft 4 variant. Draft 5 is basically never supported outside of OpenAPI.
serializing JSON discriminated unions used to be a problem for C#
What I mean is that c# doesn't support unions, as such. Using discriminator
as a mechanism to express polymorphism is one of the use cases, sure.
In this scenario, the JSON Schema may not be used for validation (directly at least), but rather for typing/parsing information.
Importantly, JSON Schema isn't a typing system. It's a constraints system. Henry Andrews' excellent blog post explains this difference well.
And code generation (either direction) isn't defined by any specification (yet), so whoever implements it is free to do what they want.
For the fully general case,
anyOf
definitely makes sense. PerhapsoneOf
should be reserved for cases where all variants have a defined discriminator?
We could try to make the generator a bit more clever and emit oneOf
where applicable, but I'm not sure what this would achieve from a validation perspective. Each element would be mutually exclusive in that case regardless, so being consistent with anyOf
seems like a better trade-off.
There is an important distinction between anyOf
and oneOf
: oneOf
requires that an implementation evaluate all of the subschemas, whereas anyOf
can be short-circuited. In general, anyOf
is preferred, especially when it contains subschemas which are already mutually exclusive.
Background and motivation
The recent popularity of function calling capabilities in LLMs as well as the upcoming OpenAPI work in ASP.NET Core has highlighted the importance of a System.Text.Json component that is capable of exporting its own serialization contracts (
JsonTypeInfo
) to JSON schema documents. Such a component should ideally satisfy the following criteria:JsonSerializerOptions
and POCO attribute annotations (e.g.JsonNamingPolicy
,JsonNumberHandling
,JsonPropertyName
,JsonIgnore
, etc.)I wrote a prototype that attempts to address the above design goals, and this was largely achieved by tapping into the metadata exposed by the STJ contract model. That being said, the existing contract APIs do not expose all metadata that is necessary to construct a schema, so in many cases the implementation had to resort to private reflection or outright replication of STJ internals. At the same time, the core mapping logic itself requires acute understanding of STJ esoterica, so it cannot be expected that such a component could be sustainably maintained by third-party authors.
I'm creating this issue to track .NET 9 work related to JSON schema extraction. The scope is related to and overlaps with https://github.com/dotnet/runtime/issues/29887 but doesn't necessarily coincide with it. At a high level, it is tracking the following goals (in order of importance):
JsonTypeInfo
contracts to JSON schema documents. Most users should able to use that directly, but would also serve as a reference implementation for those that want to map to bespoke formats (e.g. OpenAPI YAML).JsonSchema
exchange type. This is a stretch goal for .NET 9 since it would likely necessitate implementing support for the full JSON schema specification (whereas a mapper need only target a subset of the spec).Work Items
Nullable<T>
contract metadata (element type, custom element converters, etc.)~