OAI / Overlay-Specification

The OAI Overlay Specification
Apache License 2.0
45 stars 11 forks source link

Unification of Overlays and Traits #39

Open mkistler opened 5 years ago

mkistler commented 5 years ago

I'm opening this issue simply as a place to collect some ideas about how the concepts of Overlays and Traits might be brought together.

In both proposals, I think the key notion is a "fragment", which I would describe as: a "sparse" sub-object of an OpenAPI definition. In the Overlay proposal, a fragment is the value of an "Update Object" and has a type of any.

I think fragments -- which I would like to call "mixins" -- can have a more well-defined structure than just any. If we use the discriminator approach already present in OpenAPI for "mixins", we can require (and validate) conformance to a particular structure. In particular, we can require a mixin to be a "sparse" form of any well-defined OpenAPI object, e.g. Operation, Response, Parameters, or even the whole OpenAPI definition.

Mixins could be defined as just another flavor of "component". So

components:
  mixins:
    pagable:
      type: operation            << so what follows should validate as a "sparse"* OpenAPI Operation object 
      < pageable parameters and response props in OAI/Overlay-Specification#38 >

Note *: "sparse" here means all props are optional

Mixins could then be included effectively anywhere in the API doc by reference:

  $mixin: "/components/mixin/pageable"

By virtue of the mixin type, it could be validated as allowed or not allowed at the point it is referenced.

Now Overlays can become simply a mechanism for inserting mixins and mixin references into an API document. The JMESPath mechanism of overlays still provide the ability to apply a single update to multiple target objects using wildcards, but that update would now be expressed as simply adding a "mixin" to each of the target objects.

These are just strawman ideas and I do not claim to have thought them through any detail, but I hope they can serve as useful seeds for discussion.

Examples

Mixins are a recasting of "Traits" as described in OAI/Overlay-Specification#38. Here's how I imagine mixins could be used to apply a "pageable" trait to operations.

The "pageable" mixin would be defined in the components / mixins section of the API doc:

components:
  mixins:
    pagable:
      type: operation
      content:
        parameters:
          - name: pageSize
            in: query
            type: number
            required: false
            default: 10
          - name: pageNumber
            in: query
            type: number
            required: false
            default: 1
        response:
          200:
            schema:
              type: object
              pagination:
                $ref: "#/definitions/PaginationFragment"

and an operation would "apply" the "pageable" mixin with a $mixin property, as follows:

paths:
  /foo:
    get:
      description: search for foo resources
      $mixin: 
        - pagable
      parameters:
        - name: q
          in: query
          type: string
          required: true
      responses:
        200:
          schema:
            type: object
            FooItems:
              array:
                items:
                  $ref: '#/definitions/FooItem'

The application of the mixin to the operation would yield on operation like:

paths:
  /foo:
    get:
      description: search for foo resources
      parameters:
        - name: q
          in: query
          type: string
          required: true
        - name: pageSize
          in: query
          type: number
          required: false
          default: 10
        - name: pageNumber
          in: query
          type: number
          required: false
          default: 1
      responses:
        200:
          schema:
            type: object
            FooItems:
              array:
                items:
                  $ref: '#/definitions/FooItem'
            pagination:
              $ref: "#/definitions/PaginationFragment"
mkistler commented 5 years ago

I listened back through the recording of the last meeting and attempted to catalog some of the issues to be addressed in this proposal. Here's what I found:

mkistler commented 5 years ago

Leveraging mixins in overlays

If we create a "mixin" concept in the OpenAPI spec, we could leverage this in the Overlay spec by making mixins the mechanism by which overlays are applied. Adapting the example from OAI/Overlay-Specification#36:

overlay: 1.0.0
info:
  title: Update many objects at once
  version: 1.0.0
updates:
- target: paths.*.get
    mixin:
      type: operation
      content:
        x-safe: true
- target: paths.*.get.parameters[?name=='filter' && in=='query']
    mixin:
      type: parameter
      content:
        schema:
          $ref: "/components/schemas/filterSchema"

Here, the structure and application of the "mixins" in the overlay doc would be defined in the OpenAPI spec, which would simplify the overlay mechanism for those already versed in OpenAPI.

mkistler commented 5 years ago

How are mixins "applied"?

The basic idea is that the mixin "content" is JSON merged (or the YAML equivalent) with the element that immediately contains it. So with these definitions:

thing:
  foo: foo
  bar: bar
  $mixin: /components/mixins/bazqux

components:
  mixins:
    bazqux:
      content: 
        baz: baz
        qux: qux

the "realized" spec is:

thing:
  foo: foo
  bar: bar
  baz: baz
  qux: qux

I believe that this description could be equally applied to arrays. So for example:

things:
  - foo
  - bar
  - $mixin: /components/mixins/bazqux

components:
  mixins:
    bazqux:
      content: 
        - baz
        - qux

would become

things:
  - foo
  - bar
  - baz
  - qux
tedepstein commented 5 years ago

Notes from TSC Meeting, 2019-02-28:

whitlockjc commented 5 years ago

I like where $mixin is going but due to the similarity to $ref, I'm not quite sure I agree with the usage model. The reference identifier of a $ref is a JSON Pointer. Using a JSON Pointer for the reference identifier allows for referencing locations in the local document and remote documents. With $mixin, the reference identifier is an arbitrary string that is used like an identifier where it's a key in some object defined in a pre-determined location. Using an id-like lookup poses two potential problems:

  1. It only supports local references (or so it would seem)
  2. It becomes document/tooling specific as to where the mixin definition container lives (where the id is looked up)

I personally would rather see $mixin work like $ref where the reference identifer(s) are JSON Pointers instead of identifiers. OpenAPI can still use #/components/mixins as an approved location for creating similar things, much like we use other parts of #/components, but using JSON Pointers allows more flexibility.

Beyond OpenAPI

Have we thought about taking an approach that is beyond OpenAPI? I could see this feature becoming an enhancement/extension/replacement to JSON References. It would be slick to define the JSON Pointer to the thing(s) being referenced and the "resolution action" being performed (merge, replace, ...). Below is an example (shooting from the hip here, no real thought put into it):

Unresolved

me:
  firstName: Jeremy
  lastName: Whitlock
refd:
  $ref: '#/me'
  # The default '$action' is 'replace' and can be omitted
  # $action: replace
merged:
  summary: This is me
  $ref: '#/me'
  $action: merge
# If the list of actions is only merge and replace, maybe we use a boolean to
# indicate performing the non-default action.
# refdV2:
#   $ref: '#/me'
#   # The default '$merge' is 'false' and can be omitted
#   # $merge: false
# mergedV2:
#   summary: This is me
#   $ref: '#/me'
#   $merge: true

Resolved

me:
  firstName: Jeremy
  lastName: Whitlock
refd:
  firstName: Jeremy
  lastName: Whitlock
merged:
  summary: This is me
  firstName: Jeremy
  lastName: Whitlock
pjmolina commented 5 years ago

About @whitlockjc Beyond OpenAPI: That's exactly that I was suggesting with this proposal: Canonical Form.

handrews commented 5 years ago

@tedepstein

JSON Schema tried to agree a merge feature, put a lot of effort into this, but ultimately had to abandon those efforts. Maybe we should get some input to see where this effort hit the wall, and see if there's something we should do to avoid these problems.

That's not quite what happened. The TL;DR is that we went through an exhaustive effort to analyze various problems and proposed solutions that had been plaguing the project since before draft-04. $merge was one proposal, and it was ultimately decisively rejected. We did not fail to produce a merge feature, we decided that one was not just unnecessary but clearly undesirable for out project.

This does not necessarily mean that $mixin is wrong for OAS- I'll come back to that at the end.


The long version is very long (~15 issues over 2 repositories, two completely different sets of editors, the first of which abandoned the project over disagreements on this topic, and a total of ~500 issue comments on GitHub, ~235 on the final issue alone).

Fundamentally, JSON Schema has all of the tools that it needs with keywords like allOf, etc. for effective modularity and re-use. The biggest unsolved use case was the desire to forbid properties that are not defined anywhere, regardless of how many *Of or if or other combinatorial keywords are used to break the larger schema up in to components. additionalProperties notoriously cannot "see through" such constructs.

For allOf, you could actually solve this with a pre-processing step. But for oneOf, anyOf, if/then/else and possibly other things I'm forgetting (oh yeah, dependencies, I always forget that keyword), you need runtime information in order to get the correct desired behavior. So we came up with the unevaluatedProperties keyword, which has the necessary runtime behavior. The OAS 3 schema, in a refactored form illustrates this perfectly.

Now, here's the key part: $merge cannot solve this problem!

It can solve simpler forms of "I want to splice properties from X into Y", but not the full problem that you see in complex schemas like OAS.

It also makes a mess of schema implementation in a number of problematic ways, because it splices arbitrary stuff together. It's hard to reason about that in code. It's kind of like splicing lines of code from one function into another. It's... not a good interface.

In JSON Schema, each schema object has well-defined results as a function of its keywords (which include various ways of incorporating results from subschemas). $merge breaks that property, while unevaluatedProperties does not.

We only found one person who wanted to splice arbitrary things in, and his use case involved the fact that the source from which he wanted to splice was a document that he had no control over as a government contractor due to security regulations. We decided that that was too much of a niche use case to motivate such a powerful feature.

One popular implementation does have an extension for $merge, and the fact that that is used was cited as a reason to add it. However, that is not, by itself, a valid argument- people use the keywords you give them. I expect people will happily use unevaluatedProperties for this once they learn it.


The other main use case for $merge was stuff like this:

{
    "title": "Foo",
    "allOf": {"$ref": "#/definitions/bar"}
}

where the title should override any title in the #/definitions/bar schema. This use case was all about annotations, rather than validation. The tricky part of solving this use case was:

So in the end, we formalized how annotations are collected, in order to make it easy to figure out that, if there are two values for title for a location in the instance, you can determine which value comes from what part of the schema. If you want to take the one that appears outside of a $ref, you can do that. If you want to combine them somehow, you can do that instead. etc.


So. The problem that OAS faces is that, since your document is composed of many different pieces with different rules for evaluating them, you cannot make use of all of the JSON Schema features that support modularity and re-use. So you need to come up with something else, and I guess that's $mixin.

If it is used in OAS but not allowed to impact schema objects, then I'm entirely fine with it.

If you decide to make it a feature of the OAS Schema Object, then I might as well give up on converging OAS Schema with JSON Schema. Although I would be open to handing JSON Schema over to this group (assuming the other project editors there agreed). I spent a year on this topic already and have what I consider very good and extensively researched reasons for not including such a thing in JSON Schema, and I have less than zero desire to revisit it.

But the possibility of handing it over is a sincere offer. I am finally (as of this week) making progress on getting draft-08 out the door. I will definitely finish that. I intend to do one more draft as several things need wrapping up. But if the community wants to go in a different direction I would not, at this point, mind being relieved of the responsibility. I can't speak for the other JSON Schema folks, though.

[EDIT: OK that got a little pessimistic at the end there didn't it? Sorry, it's been a rough week in JSON Schema land.]

tedepstein commented 5 years ago

@handrews , thanks for the detailed background, and sorry to hear that JSON Schema land has been rough. If that's tainting your perspective, maybe give it a little time and reconsider this conclusion in particular:

If you decide to make it a feature of the OAS Schema Object, then I might as well give up on converging OAS Schema with JSON Schema.

I don't think it's our intent to make it specifically a feature of OAS Schema Object. But it might be difficult, or just awkward, to insulate Schema Object from traits.

Could we think of traits (or mixins), and trait application, as a separate layer of processing, similar to a schema generator?

The intent is not to change what's considered a valid schema, a valid parameter, response, etc. The intent is to give users a consistent, generalized way of composing those objects through a purely mechanical (not semantic) and highly flexible form of composition. Validation takes place after traits have been applied, so the resulting objects have to comply with all of the usual validation rules.

This doesn't prevent someone from misusing traits as a replacement for better, more semantically rich and appropriately constrained JSON Schema affordances, like unevaluatedProperties, allOf, etc. But we can do our part to discourage this kind of misuse.

I think the challenge for OpenAPI comes down to what you said about JSON Schema:

Fundamentally, JSON Schema has all of the tools that it needs with keywords like allOf, etc. for effective modularity and re-use.

The problem is that we cannot say the same thing about OpenAPI. There are lots of odd cases that can be solved by traits, that would require a much bigger investment to solve by more specialized means. This is my personal perspective, and I could try to elaborate, but maybe you and others should have a chance to respond first. I've exceeded my quota of monologues for today.

handrews commented 5 years ago

@tedepstein

Could we think of traits (or mixins), and trait application, as a separate layer of processing, similar to a schema generator?

That possibility for $merge was discussed extensively. In that case, it was not possible, because of how it interacts with the (necessarily) lazy evaluation of $ref. There is no way to pre-process all uses of $merge out.

I have not gone through all of the comments above, so perhaps $mixin does not have that problem, and is purely a static edit of the file. Perhaps that's what you mean by mechanical rather than semantic. Ultimately, it was not possible to separate the semantic effects of $merge from the more obvious mechanical manipulations.

If so, then it could be totally separate thing from the JSON Schema spec, and I can tell the inevitable people who show up to demand it to just use it separately.

MikeRalphson commented 5 years ago

Though conceptually unifying overlays and traits / mixins sounds desirable, I have, over the last two TSC meetings begun to feel that we have not only been toying with introducing potentially large areas of complexity as @handrews aludes to, which we will struggle to resolve in the putative v3.1 timeframe, but we are drifting further and further from what people are likely (and have actually stated) they want to use traits for (and from here I'm going to separate the terms "trait" and "mixin" - whatever we decide, I would like to avoid the term 'mixin' in the spec, as Open-RPC have just used mixin to mean something else, and APIs themselves often use the term as a method of requesting additional information in the response representation).

As I understand it from the linked issues (some of which have the most positive :+1: reactions of any in the repo), the main driver for traits is to prevent repetition within an OAS document specifically in the areas of request parameters and response headers.

An oft-stated case is where someone wanting to describe an API says they wish to add a set of parameters or response headers to "every" operation. (I know that @webron has said that wherever someone uses the word "every" they mean "most places" and would like some kind of exception mechanism, but I feel we don't necessarily have to accommodate such inconsistencies.) The mixin discussions did not seem to address this global applicability requirement.

What I have not seen much (if any) call for from users is for 'sparse object updates', or the ability to 'mixin' to other areas of the specification, such as request/response schemas. If this is deemed necessary (and I know that RAML traits work like this), then I feel that making overlays a core part of the OAS specification is the way to go. overlay objects would live under components/overlays, and an overlays array property (at whichever levels we thought was appropriate - top, pathItem, operation etc - would apply them.

Then a 'overlay document' just becomes a case (like reusable schema component libraries) of an OAS document which has no paths.

If necessary, the target property of an overlay object could become a targets array, to make application to diverse areas of the target document easier.

If we don't feel that overlays handle all cases which traits are required to (i.e. we wish to simply point outwards towards a trait, not towards an overlay which points back inward to areas of the document) then something like a trait object as explored in this gist might be worth considering.

mkistler commented 5 years ago

@handrews I apologize if I have reopened some old wounds. I don't know all the history of $merge in JSON schema, but I do think that OpenAPI needs some more powerful composition mechanisms than it currently possesses (echoing @tedepstein 's sentiments above).

Regarding

It also makes a mess of schema implementation in a number of problematic ways, because it splices arbitrary stuff together.

If that's a problem, we could require that mixins specify their type (that was in my original proposal, but @darrelmiller suggested we make it optional). Making type required would mean the mixin content would not be "arbitrary", but in fact well-defined and could be validated.

But I suppose another obvious direction we could take here is to eliminate the current restrictions for allOf, oneOf, anyOf in OAS. If these provide all the necessary mechanics for composition in JSON schema, then we should take a hard look at whether this is also the right solution for OAS.

handrews commented 5 years ago

@mkistler

If that's a problem, we could require that mixins specify their type (that was in my original proposal, but @darrelmiller suggested we make it optional). Making type required would mean the mixin content would not be "arbitrary", but in fact well-defined and could be validated.

type is actually not the issue at all.

Hmmm... I guess I'm just going to have to explain the generalized JSON Schema processing model, as it has developed in order to enable properly supporting non-validation vocabularies such as code, ui, and documentation generation. Seeing as OAS is one of the major motivations behind this (particularly for code and doc gen), it's worth a look anyway.

This is going to be long and someone will no doubt complain about that, but the short version does not seem to be getting across the full complexity of the problem.

It might take me a couple of days to get it written up. I've got about half of it so far but don't have more time to spend on it today.

handrews commented 5 years ago

@mkistler @darrelmiller @MikeRalphson @tedepstein given https://github.com/OAI/OpenAPI-Specification/pull/1865#issuecomment-472953005 it sounds like there's not much point in me writing up why this is such a problem for JSON Schema.

I'm still happy to do so, because I think it is important, but do let me know as it's substantial work to explain it all and I don't want to bother if this is a done deal.

tedepstein commented 5 years ago

Hi @handrews,

I'm not sure how the comment you referenced changes the situation. But I don't think you should invest a lot of time in this.

Could we think of traits (or mixins), and trait application, as a separate layer of processing, similar to a schema generator?

That possibility for $merge was discussed extensively. In that case, it was not possible, because of how it interacts with the (necessarily) lazy evaluation of $ref. There is no way to pre-process all uses of $merge out.

I think I understand.

Do I have the right idea?

I have not gone through all of the comments above, so perhaps $mixin does not have that problem, and is purely a static edit of the file. Perhaps that's what you mean by mechanical rather than semantic.

When I said that traits are "purely mechanical," that is probably not right. It's more accurate to say that traits would be applied before validation of the resulting, modified OAS document. And while it might be possible for the application of valid traits to a valid schema to produce an invalid schema, problems like that should be evident at design time.

I had not considered lazy evaluation of $refs, so maybe that changes things.

Also, we learned some things from @usarid on today's call:

We agreed on today's call to revisit the use cases driving discussions about Traits and Overlays. We want to make sure we have a representative set of use cases that cover the most common patterns, and determine which of these call for an internal composition feature (like traits) vs. external (like overlays), where information is being added at a different time, or by different parties, from the base document. Some use cases might reasonably call for both.

Once we have that, we should have a better sense of what traits and overlays might do. Until then maybe we don't need to go too deep into the Schema implications.

handrews commented 5 years ago

@tedepstein OK, after a lot of thought, I've distilled this down to a relatively concise explanation.

Part of the problem with $merge is potentially unexpected behavior as large systems grow and change.

If I have a schema that looks like:

{
    "$id": "https://example.com/schemas/foo",
    "title": "Foo",
    "type": "object",
    "properties": {
        "specialProp1": {"type": "integer"},
        "specialProp2": {"type": "boolean"}
    },
    "additionalProperties": {"type": "string}
}

I publish this schema as the schema that officially validates Foos.

You decide that you have a FooBar which is pretty close to being a Foo but has one more special property in it. So you $merge or $mixin or whatever:

{
    "$id": "https://example.com/schemas/foobar",
    "title": "FooBar",
    "$mixinMergeThing": [
        {"$ref": "https://example.com/schemas/foo"},
        {
            "required": ["specialProp3"],
            "properties": {"specialProp3": {"type": "boolean"}}
        }
    ]
}

Because of how properties and additionalProperties interact, this has the effect that, if an instance has a property named "specialProp3", then to validate as a Foo, it would have to be a string, but to validate as a FooBar, it would have to be a boolean.

With all of the current and planned features of JSON Schema, this is intentionally not possible. If you build on a Foo, then your derived schema MUST satisfy all of the constraints specified by the Foo schema.

But in this example, FooBar is derived from (in the sense of depending on / building on) Foo. But (due to the required) a valid FooBar is in fact never a valid Foo. You can, in fact, use this sort of keyword to slice things up and produce new schemas that have no clear relationship to the constituent schema.

JSON Schema is a constraint system. A fundamental rule of such a constraint system is that you cannot lift constraints. You can add more, and that is how things are re-used. But you cannot lift them. unevaluatedProperties lets you do some complex things, but it is still adding constraints.

Once all relative URIs in the schema are resolved (interactions between $id and $ref), each schema object's constraints can be evaluated independent of any parent or sibling schemas. First you evaluate all subschemas, and then you evaluate the local keywords.

If you force some sort of $merge behavior into JSON Schema in the context of OpenAPI, then it is no longer a proper constraint system. While the independent evaluation of objects still technically exists in the form of the lazily evaluated merge results, schema authors cannot see those objects easily. In terms of what you can see, you can no longer trust that your schema object is evaluated independently.

The author of the Foo schema may not have any idea that there is a FooBar that splices their Foo schema. But now, instead of the Foo schema being a properly encapsulated description of valid Foos, it is just a source of keywords that can be rearranged arbitrarily. There is no encapsulation anymore.

I have spent pretty much the entire current draft cycle focused on keeping people from breaking JSON Schema's fundamental constraint and encapsulation design.

All of the work on modular extensibility, keyword classification, and unevaluatedProperties has been towards that goal. unevaluatedProperties is obvious, but the rest of it I have done in order to enable users (specifically OAS) to build things like code generation vocabularies out of annotations, and therefore not need $merge splicing features that ruin the constraint system in order to get the desired results.

That required:

It has been a lot of work, and not just by me. But if OpenAPI decides to allow schema mixins... well, you're probably one of the biggest users of JSON Schema. People who are looking for shortcuts instead of building sustainable systems will demand the mixin feature be added to JSON Schema proper instead of learning all of the things that we did to build a better system.

I realize that not everyone cares about JSON Schema having a consistent, extensible, and elegant underlying model. Although I assert that having such an underlying model would make JSON Schema more successful in the long run as use cases grow and change. I certainly don't expect OpenAPI to consider this property of JSON Schema a goal.

But I hope this makes it clear why I'm not happy with this direction and how it is likely to impact JSON Schema if chosen.

tedepstein commented 5 years ago

Thanks for the lucid explanation @handrews. I think preserving the integrity of JSON Schema's processing model and composition semantics should be an important design goal for us.

I really cannot say much more without looking more carefully at use cases.

But I do think part of our problem is that we're (still) trying to use JSON Schema as a type definition language. A prototypical use case for mix-ins goes something like, "I want to add these properties to the object schema of the request body." But you're not really adding properties, you're adding constraints, which has a whole different set of implications. And the nature of the "adding" operation needs much more careful thought than we're accustomed to giving it.