OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
29.11k stars 9.08k forks source link

Details of model composition should be documented #1428

Closed mkistler closed 6 years ago

mkistler commented 6 years ago

OASv3.0 says:

The OpenAPI Specification allows combining and extending model definitions using the allOf property of JSON Schema, in effect offering model composition. allOf takes an array of object definitions that are validated independently but together compose a single object.

However, the spec does not define how this composition should be done. In particular, it is unclear how to interpret an "allOf" composition when a property appears in more than one component of the "allOf". For example, in this schema:

      "Thing": {
        "allOf": [
          {
            "properties": {
              "foo": {
                "type": "string"
              }
            }
          },
          {
            "properties": {
              "foo": {
                "type": "string"
              }
            },
            "required": [
              "foo"
            ]
          }
        ]
      },

it is not clear from the specification whether foo is required or optional in the Thing schema.

Multiple interpretations are possible, including "first occurrence of the property wins" or "last occurrence of the property wins". However, to be consistent with JSON Schema, it seems that we need a more complicated interpretation. The JSON Schema spec says:

An instance validates successfully against [allOf] if it validates successfully against all schemas defined by this keyword's value.

To be consistent with JSON schema, it seems that we should "compose" like-named properties in a "allOf" in a way that produces the "most restrictive" property definition. For example, a property that is required in any component of the "allOf" would be required in the composed model. Or if one property specifies a maximum, the composed property would have that maximum.

Whether it is the "composed property" interpretation or some other that is appropriate, it should be clearly documented in the spec.

handrews commented 6 years ago

@mkistler based on the JSON Schema spec:

      "Thing": {
        "allOf": [
          {
            "properties": {
              "foo": {
                "type": "string"
              }
            }
          },
          {
            "properties": {
              "foo": {
                "type": "string"
              }
            },
            "required": [
              "foo"
            ]
          }
        ]
      }

is equivalent to

      "Thing": {
        "required": ["foo"],
        "properties": {
          "foo": {
              "type": "string"
          }
        }
      }

because the "foo" subschemas are identical and "required" only needs to appear once to apply across the board.

Whether or not you actually want to perform such transforms depends on what you're trying to do. Code generation? In the latest draft of JSON Schema there is some guidance on how to find and combine all of the schema objects that apply to a given location in an instance, which makes it a bit more clear that you can do this sort of thing.

mkistler commented 6 years ago

@handrews I like your interpretation of composition. My concern is that I do not believe this is clear from the specification. The specification makes some qualifications about allOf and some other JSON Schema constructs:

their definitions were adjusted to the OpenAPI Specification. ... allOf - Inline or referenced schema MUST be of a Schema Object and not a standard JSON Schema.

So it's not clear to me if composition conforms to the JSON schema definition or some OpenAPI-specific interpretation.

And I don't think I am the only one that is unclear on this, since some current OpenAPI tools, e.g. swagger-editor, do not interpret the Thing schema above in the way you described.

So I'm suggesting a small addition the spec something along the lines of:

The composed object definition contains the union of the properties of the component objects of allOf. If a property appears in more than one component object, its attributes (required, maximum, minimum, etc) in the composed object are taken to be the most restrictive value for that attribute from the component properties.

Or maybe more simply, just state that the composition should be interpreted consistent with the JSON schema interpretation of allOf (though that leaves it up to the reader to understand the nuances of this).

handrews commented 6 years ago

Or maybe more simply, just state that the composition should be interpreted consistent with the JSON schema interpretation of allOf (though that leaves it up to the reader to understand the nuances of this).

As one of the editors of the JSON Schema spec, I would prefer this :-)

More broadly, I am hoping that we can converge the JSON Schema spec and Open API uses of JSON Schema spec to the point where little or even no special wording on the part of OAI is necessary, primarily through improving the modularity of JSON Schema. Embedding a subset of JSON Schema into a larger document is done in several projects (OAI is the largest that I know of) so we should address that on the JSON Schema spec side.

darrelmiller commented 6 years ago

@mkistler The allOf definition says that the inline schema must be an OpenAPI Schema Object so as to allow OpenAPI extensions and apply OpenAPI constraints to JSON Schema. However, the definition of Schema Object says:

Unless stated otherwise, the property definitions follow the JSON Schema.

So, based on the lack of any explicit differentiation between the behavior of JSON Schema and Schema Objects with respect to schema composition, my interpretation of the spec means that JSON Schema rules should be respected.

mkistler commented 6 years ago

I like this answer. I think we are all agreed on the desired interpretation and it's now clearly documented in this issue.

handrews commented 6 years ago

If a property appears in more than one component object, its attributes (required, maximum, minimum, etc) in the composed object are taken to be the most restrictive value for that attribute from the component properties.

This actually flows directly from JSON Schema validation as a constraint system. Constraints are always added, never removed.

It's probably better to think of this as layering constraints rather than object composition. Composition adds fields, constraints restrict fields.

mkistler commented 6 years ago

Okay ... this opens a new can of worms. @handrews what should be the "composed model" in this scenario:

      "Thing": {
        "allOf": [
          {
            "properties": {
              "foo": {
                "type": "string"
              },
              "bar": {
                "type": "string"
               }
            }
          },
          {
            "properties": {
              "foo": {
                "type": "string"
              },
              "baz": {
                "type": "string"
               }
            },
            "required": [
              "foo"
            ]
          }
        ]
      }

?

I know that the Swagger tools interpret this as a model with three properties, "foo", "bar", and "baz", but this interpretation is not "restricting" in the way you suggest.

darrelmiller commented 6 years ago

@mkistler As @handrews said, it is much easier to rationalize what JSON Schema will do if you consider it as constraints, not a model description. The combination of the two schemas says that the object must have a property called "foo" that must be a string and if it has properties called "bar" or "baz" then they must be strings.

handrews commented 6 years ago

@mkistler I understand how it doesn't look that way, but that is actually what is happening. The "problem" here (and it is true in several other uses of JSON Schema) is that in most cases, you can treat JSON Schema as if it were a data definition system instead of a constraint system and it more or less works.

I am hoping to pick up on some work that another person started to write vocabulary that handles things specific to data definition better, and clarify how data definition and validation fit with each other.

In the meantime, here's how to think about your example. I'm going to link to the latest spec for some definitions- we only just wrote those definitions now, but they hold equally true for the drafts that OpenAPI uses.

So the 1st "allOf" subschema ensures that if the instance is an object, and if there are properties named "foo" or "bar", then those property values must be strings.

The 2nd "allOf" subschema ensures that if the instance is an object, there must be a property "foo" with a string value, and if there is a property "baz", it also must have a string value.

Note that there is nothing that says there can't be other properties. There's nothing that even requires the instance be an object- an integer would successfully validate as a Thing in your example because there is no "type" assertion for the top level of the instance, only for properties.

So, this will validate successfully against the 1st "allOf" subschema, but not the 2nd:

{"foo": "oof", "baz": 42}

and this will validate against the 2nd but not the 1st:

{"foo": "oof", "bar": 123}

Neither of these examples will validate against Thing as written. However this:

{"foo": "oof", "bar": "stuff"}

will. It validates successfully against both branches of the "allOf"

So that is how the constraints work, and how combining schemas with "allOf" actually reduces the set of possible valid instances.

mkistler commented 6 years ago

Thank you! This does clarify things. In fact, as soon as I read:

The empty schema, {} has no constraints. Everything is valid against it.

the light bulb went on. I was concerned that only the properties defined in the schema could be present -- that additionalProperties was needed to signal that other properties were allowed.

But now I see how this works. And likewise, additionalProperties doesn't say whether additional properties are allowed, but rather says if additional properties are present, they must be of the type indicated.

Thanks for setting me straight.

handrews commented 6 years ago

@mkistler exactly! Another thing we clarified in recent drafts is allowing boolean values for any schema (not just subschemas for additionalProperties and additionalItems), and defining the boolean schemas as shorthand assertions that always pass (true, equivalent to {}) or always fail (false, equivalent to {"not": {}}).

When you look at it that way, "additionalProperties": false is not some odd special case- it just says if additional properties are present, then they will automatically fail validation because everything fails against the false boolean schema.

handrews commented 6 years ago

Hopefully adding those definition sections that we added in draft-07 will make this processing model a lot more clear.

spacether commented 3 years ago

@handrews collapsing allOf schemas like you have done only works if the allOf schemas overlap and one is more restrictive than the other.

Should these 3 cases be collapsed?

Example 1:

ObjWithAOrB:
  oneOf:
    - type: object
      properties:
        - a:
          type: string
      required:
      - a
    - type: object
      properties:
        - b:
          type: string
      required:
      - b

ComposedWithAOrB:
  allOf:
    - ObjWithAOrB

Example 2:

ObjWithATypeCollision:
  oneOf:
    - type: object
      properties:
        - a:
          type: string
      required:
      - a
    - type: object
      properties:
        - a:
          type: number
      required:
      - a

ComposedWithTypeCollision:
  allOf:
    - ObjWithATypeCollision

What type is ComposedWithTypeCollision.a?

Example 3

ObjWithAMultipleOf2:
    - type: object
      properties:
        - a:
          type: number
      required:
      - a
      multipleOf: 2
ObjWithAMultipleOf3:
    - type: object
      properties:
        - a:
          type: number
      required:
      - a
      multipleOf: 3

ComposedWith2Validations:
  allOf:
    - ObjWithAMultipleOf2
    - ObjWithAMultipleOf3

Where are the validations for a?

When would collapsing down allof properties not work?

  1. when they should not exist together allof oneOf (a OR b)
  2. when types collide (a is string or number)
  3. when validations must span two definitions (a must be a multiple of 2 and 3)

For these reasons, I think that the collapsing that you show only works under certain conditions.

Do you think these 3 example should be collapsed? When do you think collapsing could/should be done?

handrews commented 3 years ago

@spacether it's too late at night here for me to attempt schema algebra but let me sketch out a few things:

  1. You're using oneOf a lot here, and oneOf is much more complex to work with than allOf. You usually can't rearrange a schema to totally eliminate a oneOf, except in a few cases where other keywords produce similar behavior (e.g. the array form of type or certain arrangements of if/then/else). not is the really hard one, actually- you can rework oneOf into a combination of allOfs and nots but that's really hard to read.
  2. If the only combinatorial or conditional applicator keyword involved is allOf, you can generally collapse it, although it's non-trivial and some cases cannot be collapsed because they involve the same keyword used multiple times in a way that is not contradictory and cannot be reduced to one use of the keyword. I wrote a partial implementation of this back in the draft-07 days, although I don't think that package has been significantly maintained since I left that job.

As you can see if you poke around in that repo, I did the collapsing because we were using schemas for documentation and having a ton of allOfs show up when they were often just artifacts of making schemas modular rather than being significant to end-users cluttered the documentation. So I was reducing that clutter as much as possible. It's debatable as to whether that's really the "right" thing to do. But that sort of thing- reducing clutter without changing functionality- is the main reason I'd ever do that. As far as validation, there's no reason to collapse things.

If you're making use of annotations, which your application might interpret based on their location in the schema (e.g. a title outside of a $ref might override one inside of the $ref) then doing a lot of de-referencing and collapsing is not advised.

Philosophically, I'd like to encourage folks to design annotation vocabularies (now that extension vocabularies are supported in OAS 3.1 and JSON Schema 2020-12) to give tools hints as to how to handle things like allOf. You can find examples of this in other issues here. But the general idea is to add keywords that don't do any validation, but indicate whether an adjacent** allOf should be treated as composition or some sort of inheritance for code generation purposes. The validation is the same either way. This is how I want to bridge the gap between JSON Schema being a constraint validation system while tools like code and documentation generation want a data definition language. These are not the same things, and a lot of difficult problems have arisen by trying to make a constraint system a data definition system. Annotations provide a flexible way to solve those problems.

**adjacent keywords are keywords in the same schema object

spacether commented 3 years ago

Heads up, example3 had a typo and was updated to not use oneOf.