OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
28.94k stars 9.07k forks source link

Proposal to improve wording of 'discriminator object' section #2247

Closed sebastien-rosset closed 4 months ago

sebastien-rosset commented 4 years ago

It would be worthwhile to clarify the wording of the discriminator object section in the OAI specification, regardless of the long term direction. Some of us have repeatedly done a line-by-line review of the discriminator object section in the existing OAI spec; unfortunately different people cannot seem to agree on a single interpretation, or they say the spec is ambiguous. As a result, implementations differ on the validation logic (and also in related use cases such as unmarshaling). I am aware of the many issues about discriminators, including #2143, but I haven't seen a proposal to improve the wording.

With this in mind, we've tried to identify the sentences in the spec that cause ambiguity, explain why we think there are ambiguities, and propose ways to improve the wording. We used a PR to hopefully trigger a discussion: #2216

Even if discriminator is completely removed, I think it's still worth clarifying because products tend to stick to an OAS version for a long time, and it would help to bring consistency across tools.

sebastien-rosset commented 4 years ago

@tedepstein, in #2141 two of your sentences caught my attention:

        # Require type==obj1, so a standard JSON schema validator can discriminate.
        # This duplicates the discriminator logic, so should not be required with 
        # an OAS-compliant validator.
        type: 
          enum:
          - obj1

I think you are providing two clarifications points about the OAI spec:

  1. You are stating the discriminator keyword can always be replaced with a JSON schema functional equivalent. In your example, this is done by defining a unique enum constraint in the concrete subtype; the discriminator keyword could be removed from the OAS doc and compliant tools should continue to work the same way functionally. In other words, validators can discriminate amongst the oneOf schema solely based on the JSON schema, independent of the discriminator keyword. There may be significant performance degradation, but functionally it is equivalent.
  2. If the payload matches more than one oneOf schema and the discriminator keyword is present, a compliant validator MUST use the discriminator to discriminate against multiple oneOf schemas. In your example, it would be ok to remove the unique enum constraint from the subtype and a compliant OAS tool would still be able to discriminate.

Am I interpreting your statements correctly?

IMO item (1) is not controversial. Item (2) is controversial. Whenever we have discussion amongst tooling developers, there are disagreements about the correct interpretation of the discriminator object section of the OAI spec. In my discussions, I hear two different interpretations of the OAI spec:

  1. Even if the discriminator is present, the validator must still validate the payload against all oneOf schemas, and a validation error must be returned if the payload matches more than one oneOf schema.
  2. When the discriminator is present, the tool is not obligated to validate the payload against all of the oneOf schemas. The tool can skip JSON schema validation and use the discriminator. There are variants mostly for performance reasons, but functionally they are equivalent: a) first validate the JSON schema, then use the discriminator if the payload matches more than one oneOf schema. b) directly use the discriminator and bypass JSON schema validation. c) same as b, but also validate the JSON schema and log warnings instead of returning error.

It would be great to clarify the discriminator object section of the OAI spec to eliminate the ambiguity. Maybe the TSC already has a very clear interpretation, in which case it's simply a matter of improving the wording. Or maybe there are a few points that need further discussion, I don't know.

tedepstein commented 4 years ago

@sebastien-rosset ,

  1. You are stating the discriminator keyword can always be replaced with a JSON schema functional equivalent.

Yes, functionally equivalent for validation purposes. But not necessarily equivalent for code generators and other processors, unless those processors use a validator to determine the subtype. My other comments in #2141 explain this in more detail.

  1. If the payload matches more than one oneOf schema and the discriminator keyword is present, a compliant validator MUST use the discriminator to discriminate against multiple oneOf schemas. In your example, it would be ok to remove the unique enum constraint from the subtype and a compliant OAS tool would still be able to discriminate.

...I hear two different interpretations of the OAI spec:

  1. Even if the discriminator is present, the validator must still validate the payload against all oneOf schemas, and a validation error must be returned if the payload matches more than one oneOf schema.
  2. When the discriminator is present, the tool is not obligated to validate the payload against all of the oneOf schemas. The tool can skip JSON schema validation and use the discriminator.

I think the first interpretation is correct. Wherever JSON Schema and OAS Schema Object have syntax in common, the semantics should be exactly the same. A oneOf assertion has a clear meaning in JSON Schema, and an instance matching more than on oneOf subschema must fail validation. If an OpenAPI processor claims to be a "validator," it must apply the validation rules defined by JSON Schema, plus any additional validation rules defined by OpenAPI.

The discriminator keyword is not defined by JSON Schema, so it's OK for that keyword to introduce additional validation semantics, annotation metadata, or expected behaviors for code generators or other kinds of processors. But the presence or absence of discriminator should not change the interpretation of oneOf. We don't want OpenAPI message validators to break compatibility with supported JSON Schema keywords, and we don't want to encourage people to write OpenAPI documents that assume non-compliant validation behaviors.

A code generator or some other processor may not claim to be a validator, and in that case, it may choose not to enforce oneOf and/or other assertions. It may generate code that tolerates oneOf violations, leaving that enforcement up to some other component. The oneOf assertion is still applicable and enforceable, it's just that the code generator, or the code produced by the generator, is not taking on the responsibility of that enforcement.

The comments you quoted were from an example of a "hybrid" schema that I wrote in order to illustrate how a single schema could use native JSON Schema for validation purposes, but also include discriminator for code generators and/or other processors. An OpenAPI Schema Object that uses discriminator might not use oneOf, and I suspect most of them do not. In that case, a standard JSON Schema validator wouldn't be able to determine the subtype, and therefore wouldn't be able to apply the subtype schema and validate against it. Only an OAS-compliant validator could do that.

So a schema that uses discriminator doesn't have to use oneOf. But if it does use oneOf, the standard validation logic defined by JSON Schema applies.

sebastien-rosset commented 4 years ago

@sebastien-rosset ,

  1. You are stating the discriminator keyword can always be replaced with a JSON schema functional equivalent.

Yes, functionally equivalent for validation purposes. But not necessarily equivalent for code generators and other processors, unless those processors use a validator to determine the subtype. My other comments in #2141 explain this in more detail.

  1. If the payload matches more than one oneOf schema and the discriminator keyword is present, a compliant validator MUST use the discriminator to discriminate against multiple oneOf schemas. In your example, it would be ok to remove the unique enum constraint from the subtype and a compliant OAS tool would still be able to discriminate.

...I hear two different interpretations of the OAI spec:

  1. Even if the discriminator is present, the validator must still validate the payload against all oneOf schemas, and a validation error must be returned if the payload matches more than one oneOf schema.
  2. When the discriminator is present, the tool is not obligated to validate the payload against all of the oneOf schemas. The tool can skip JSON schema validation and use the discriminator.

I think the first interpretation is correct. Wherever JSON Schema and OAS Schema Object have syntax in common, the semantics should be exactly the same. A oneOf assertion has a clear meaning in JSON Schema, and an instance matching more than on oneOf subschema must fail validation. If an OpenAPI processor claims to be a "validator," it must apply the validation rules defined by JSON Schema, plus any additional validation rules defined by OpenAPI.

ok. In that case IMO there is one recommendation for OAS authors that could greatly improve the runtime performance of the validation process and reduce CPU amplification attacks. Assume all oneOf child schemas declare a single enum value for the discriminator property (as in your example in #2141). In that case, the validator can use the discriminator information as a hint to improve validation performance. When the validator receives input data, instead of validating the input data against the oneOf children one by one, it can unmarshal the value of the discriminator property and it will know there can be at most one match (because every oneOf child has a single value for the enum). The validator can decide to validate all oneOf schemas anyway, but that would be wasted processing.

BTW, declaring a single enum value is particularly helpful when the oneOf children have additionalProperties: true either explicitly or implicitly. The single enum value is a foolproof way to ensure there will be a single match amongst the oneOf children.

If on the other hand, the oneOf children do not declare a unique enum value, then it forces the validator to validate the oneOf schemas one by one. This could be very costly because in that case there is no particular knowledge of the ordering of JSON schema constraints. The validator may be unlucky and disambiguate the input data only after having processed every single constraint of the oneOf schema. An attacker may even use knowledge about how the validator applies validation rules to create a CPU amplification attack against the service, possibly causing a DoS attack.

Don't you think some of these considerations and recommendations should be in the OAI spec? The spec is very light on that topic. For example, every RFC must include a security consideration section with guidelines about DoS attacks and countermeasures.

The discriminator keyword is not defined by JSON Schema, so it's OK for that keyword to introduce additional validation semantics, annotation metadata, or expected behaviors for code generators or other kinds of processors. But the presence or absence of discriminator should not change the interpretation of oneOf. We don't want OpenAPI message validators to break compatibility with supported JSON Schema keywords, and we don't want to encourage people to write OpenAPI documents that assume non-compliant validation behaviors.

A code generator or some other processor may not claim to be a validator, and in that case, it may choose not to enforce oneOf and/or other assertions. It may generate code that tolerates oneOf violations, leaving that enforcement up to some other component. The oneOf assertion is still applicable and enforceable, it's just that the code generator, or the code produced by the generator, is not taking on the responsibility of that enforcement.

The comments you quoted were from an example of a "hybrid" schema that I wrote in order to illustrate how a single schema could use native JSON Schema for validation purposes, but also include discriminator for code generators and/or other processors. An OpenAPI Schema Object that uses discriminator might not use oneOf, and I suspect most of them do not.

In our OAS docs we use discriminator and oneOf a lot. I wouldn't be able to comment on whether other authors do this or not. I am particularly interested in being able to improve the wording of the spec because I have noticed inconsistencies across validators and OAS documents.

In that case, a standard JSON Schema validator wouldn't be able to determine the subtype, and therefore wouldn't be able to apply the subtype schema and validate against it. Only an OAS-compliant validator could do that.

So a schema that uses discriminator doesn't have to use oneOf. But if it does use oneOf, the standard validation logic defined by JSON Schema applies.>

handrews commented 4 years ago

@sebastien-rosset

Don't you think some of these considerations and recommendations should be in the OAI spec? The spec is very light on that topic. For example, every RFC must include a security consideration section with guidelines about DoS attacks and countermeasures.

After OAS 3.1 catches up to the most recent JSON Schema, which supports modular extension vocabularies, we expect additional work on one or more code generation vocabularies, which will (hopefully) replace discriminator. A proposal to indicate how to more quickly process a oneOf is among the existing proposals (I wrote one up two years ago, but we went with if/then/else instead, which is a more flexible way to control validation order.

Your proposal can be implemented in modern JSON Schema by using and if/then construct in each branch of the oneOf, where the property is checked with const in the if branch:

type: object
oneOf:
  - if: {properties: {foo: {const: x}}}
    then: {$ref: #/components/schemas/a}
    else: false
  - if: {properties: {foo: {const: y}}}
    then: {$ref: #/components/schemas/b}
    else: false
  - if: {properties: {foo: {const: z}}}
    then: {$ref: #/components/schemas/c}
    else: false

(the else: false insures that oneOf semantics hold as failing the if does not cause the whole schema object to fail; this else could be omitted with an anyOf instead of a oneOf)

This makes existing validators only perform a cheap const check before proceeding, and is an idiom that code generators can recognize as well. A code generation vocabulary would add annotation keywords to explicitly note the intent of this structure so a code generator knows for certain whether this is a discriminator-ish structure or just something that coincidentally looks like one.

sebastien-rosset commented 4 years ago

After OAS 3.1 catches up to the most recent JSON Schema, which supports modular extension vocabularies, we expect additional work on one or more code generation vocabularies, which will (hopefully) replace discriminator. A proposal to indicate how to more quickly process a oneOf is among the existing proposals (I wrote one up two years ago, but we went with if/then/else instead, which is a more flexible way to control validation order.

I liked the proposal you had two years ago, it provided a simple, intuitive, declarative syntax that could be used to express common patterns including polymorphism. Now it looks like the extended vocabulary will be based on imperative syntax with control flows. On one hand I can see control flows introduce powerful constructs. For example, this could be used (and abused) to encode validation rules that span across multiple properties. But this is a very convoluted way as a grammar to express polymorphism with efficiency in mind; the OAS author will now be responsible for writing documents with a control flow logic to achieve performance goals. In practice there will be performance discrepancies across tools. This will open the door for people writing OAI compilers and optimizers because potentially the control flow logic could be very complex for humans to write efficiently. I'm concerned this is going down the path of adding more and more imperative logic.

Your proposal can be implemented in modern JSON Schema by using and if/then construct in each branch of the oneOf, where the property is checked with const in the if branch:

Hmm, the idea of control-flow being a modern way to express polymorphism.

handrews commented 4 years ago

@sebastien-rosset I suspect what I proposed two years ago would also still work. There's another approach in Appendix E of draft 2019-09.

The reason I brought up the if option is that what discriminator is doing is implicit control flow. It is a shortcut for "if the discriminator property matches the component name, then use the component." From a JSON Schema perspective, it's problematic in part because it makes the component names ("foo" in "#/components/schemas/foo") significant, which is not how JSON Schema generally works. We prefer explicit URI-references. I think there's something else that makes it difficult for us but I don't remember what it is.

sebastien-rosset commented 4 years ago

@sebastien-rosset I suspect what I proposed two years ago would also still work. There's another approach in Appendix E of draft 2019-09.

The reason I brought up the if option is that what discriminator is doing is implicit control flow. It is a shortcut for "if the discriminator property matches the component name, then use the component."

Some concepts can be thought of in terms of control flow, but hopefully that does not imply these concepts should be modeled using keywords that are typically found in imperative programming. Today, most if not all of the concepts in JSON schema and OAI spec are specified declaratively. This provides a concise and intuitive way for authors to express intent and consumers to reason about the intent. From the tooling perspective, IMO there has been a good trade-off to make the implementation relatively easy. Other JSON schema keywords also carry an implicit control flow. For example, the maximum keyword applies IF the type is integer OR number. Similarly, the behavior of additionalProperties keyword depends on the presence and annotation results of properties and patternProperties within the same schema object. Same for the 'oneOf` keyword.

I am concerned using if/then/else is a non-intuitive, overly verbose approach to express polymorphism. Polymorphism is very a common design pattern so hopefully the schema can have constructs to specify concisely. As a comparison point, polymorphism is expressed declaratively in many modeling tools, programming languages and other specification languages. As a secondary consideration, the if/then/else construct would make it significantly harder for parsers, validators and code generators to implement.

Just to be clear, I'm not arguing the extended vocabulary should never support if/then/else, I can see benefits for other use cases, or possibly to support unusual subtyping rulesets.

From a JSON Schema perspective, it's problematic in part because it makes the component names ("foo" in "#/components/schemas/foo") significant, which is not how JSON Schema generally works. We prefer explicit URI-references. I think there's something else that makes it difficult for us but I don't remember what it is.

sebastien-rosset commented 4 years ago

Now that 3.1 RC-0 has been released, I see 3.1 RC-0 has the following text which is incorrectly referring to the 3.0 release:

The discriminator object is legal only when using one of the composite keywords oneOf, anyOf, allOf. In OAS 3.0, a response payload MAY be described to be exactly one of any number of types:

My suggestion is to fix and improve the 3.1 spec, regardless of what may come in the next release.

handrews commented 6 months ago

@sebastien-rosset is there a summary of what you think needs to be done now, based on the current state of the in-development 3.0.4 and 3.1.1 releases? I see this is mostly pre-3.1.0 and it's a lot to go through to try to figure out what might or might not apply.

github-actions[bot] commented 5 months ago

This issue has been labeled with No recent activity because there has been no recent activity. It will be closed if no further activity occurs within 28 days. Please re-open this issue or open a new one after this delay if you need to.