Open mkistler opened 5 years ago
I listened back through the recording of the last meeting and attempted to catalog some of the issues to be addressed in this proposal. Here's what I found:
type
optional
If we create a "mixin" concept in the OpenAPI spec, we could leverage this in the Overlay spec by making mixins the mechanism by which overlays are applied. Adapting the example from OAI/Overlay-Specification#36:
overlay: 1.0.0
info:
title: Update many objects at once
version: 1.0.0
updates:
- target: paths.*.get
mixin:
type: operation
content:
x-safe: true
- target: paths.*.get.parameters[?name=='filter' && in=='query']
mixin:
type: parameter
content:
schema:
$ref: "/components/schemas/filterSchema"
Here, the structure and application of the "mixins" in the overlay doc would be defined in the OpenAPI spec, which would simplify the overlay mechanism for those already versed in OpenAPI.
The basic idea is that the mixin "content" is JSON merged (or the YAML equivalent) with the element that immediately contains it. So with these definitions:
thing:
foo: foo
bar: bar
$mixin: /components/mixins/bazqux
components:
mixins:
bazqux:
content:
baz: baz
qux: qux
the "realized" spec is:
thing:
foo: foo
bar: bar
baz: baz
qux: qux
I believe that this description could be equally applied to arrays. So for example:
things:
- foo
- bar
- $mixin: /components/mixins/bazqux
components:
mixins:
bazqux:
content:
- baz
- qux
would become
things:
- foo
- bar
- baz
- qux
Notes from TSC Meeting, 2019-02-28:
components
object in the current document. $mixin
(or $mixins
?) value should be an array of JSON references, so multiple mixins can be applied, and so the order of application can be deterministic. tags
, or introduce another similar feature, to allow certain types of objects (e.g. parameters) that occur in arrays to carry tag values, and elsewhere specify the sort order of those objects using an array of tag names. $ref
and $mixins
can be used in combination. $ref
, allow $mixins
as an alternative (not in addition to $ref
). Where $mixins
appears on its own, with a single mixin array element and with no sibling properties, it behaves like a $ref
. However:
$mixins
, you have the option of including more than one.$ref
, $mixins
can refer to objects that are complete, having all required properties. If the referenced object is located in /components/schemas
, /components/parameters
, or one of the other component maps, it could be considered "pre-validated." $ref
, $mixins
can refer to objects that are "sparse" subsets of the expected object type, having only some required properties, and this is expected to be the case with many mixin objects. It is up to the API designer (with help from the editor, if available) to ensure that the referring object is complete and valid, after the application of mixins.$mixin
property to allow array elements that are either references to fully formed component objects, or references to appropriately typed mixin objects. A fully formed component object will contain the contributed properties directly, whereas a mixin object will have the contributed properties under a content
container property. - $mixin: /components/mixins/bazqux
would resolve to a nested array, not append to the containing array, because the context of the $mixin
is the array element, not the array itself. So this is a limitation:
pageable
example here.$mixin
anywhere in an OpenAPI spec. Like $ref
(and even x-
specification extensions), OpenAPI will specify which objects allow $mixin
properties. $mixins
, $traits
, or whatever it is eventually called) in the merge target object, specifying merge operations in-context.I like where $mixin
is going but due to the similarity to $ref
, I'm not quite sure I agree with the usage model. The reference identifier of a $ref
is a JSON Pointer. Using a JSON Pointer for the reference identifier allows for referencing locations in the local document and remote documents. With $mixin
, the reference identifier is an arbitrary string that is used like an identifier where it's a key in some object defined in a pre-determined location. Using an id-like lookup poses two potential problems:
I personally would rather see $mixin
work like $ref
where the reference identifer(s) are JSON Pointers instead of identifiers. OpenAPI can still use #/components/mixins
as an approved location for creating similar things, much like we use other parts of #/components
, but using JSON Pointers allows more flexibility.
Have we thought about taking an approach that is beyond OpenAPI? I could see this feature becoming an enhancement/extension/replacement to JSON References. It would be slick to define the JSON Pointer to the thing(s) being referenced and the "resolution action" being performed (merge
, replace
, ...). Below is an example (shooting from the hip here, no real thought put into it):
Unresolved
me:
firstName: Jeremy
lastName: Whitlock
refd:
$ref: '#/me'
# The default '$action' is 'replace' and can be omitted
# $action: replace
merged:
summary: This is me
$ref: '#/me'
$action: merge
# If the list of actions is only merge and replace, maybe we use a boolean to
# indicate performing the non-default action.
# refdV2:
# $ref: '#/me'
# # The default '$merge' is 'false' and can be omitted
# # $merge: false
# mergedV2:
# summary: This is me
# $ref: '#/me'
# $merge: true
Resolved
me:
firstName: Jeremy
lastName: Whitlock
refd:
firstName: Jeremy
lastName: Whitlock
merged:
summary: This is me
firstName: Jeremy
lastName: Whitlock
About @whitlockjc Beyond OpenAPI: That's exactly that I was suggesting with this proposal: Canonical Form.
@tedepstein
JSON Schema tried to agree a merge feature, put a lot of effort into this, but ultimately had to abandon those efforts. Maybe we should get some input to see where this effort hit the wall, and see if there's something we should do to avoid these problems.
That's not quite what happened. The TL;DR is that we went through an exhaustive effort to analyze various problems and proposed solutions that had been plaguing the project since before draft-04. $merge
was one proposal, and it was ultimately decisively rejected. We did not fail to produce a merge feature, we decided that one was not just unnecessary but clearly undesirable for out project.
This does not necessarily mean that $mixin
is wrong for OAS- I'll come back to that at the end.
The long version is very long (~15 issues over 2 repositories, two completely different sets of editors, the first of which abandoned the project over disagreements on this topic, and a total of ~500 issue comments on GitHub, ~235 on the final issue alone).
Fundamentally, JSON Schema has all of the tools that it needs with keywords like allOf
, etc. for effective modularity and re-use. The biggest unsolved use case was the desire to forbid properties that are not defined anywhere, regardless of how many *Of
or if
or other combinatorial keywords are used to break the larger schema up in to components. additionalProperties
notoriously cannot "see through" such constructs.
For allOf
, you could actually solve this with a pre-processing step. But for oneOf
, anyOf
, if
/then
/else
and possibly other things I'm forgetting (oh yeah, dependencies
, I always forget that keyword), you need runtime information in order to get the correct desired behavior. So we came up with the unevaluatedProperties
keyword, which has the necessary runtime behavior. The OAS 3 schema, in a refactored form illustrates this perfectly.
Now, here's the key part: $merge
cannot solve this problem!
It can solve simpler forms of "I want to splice properties from X into Y", but not the full problem that you see in complex schemas like OAS.
It also makes a mess of schema implementation in a number of problematic ways, because it splices arbitrary stuff together. It's hard to reason about that in code. It's kind of like splicing lines of code from one function into another. It's... not a good interface.
In JSON Schema, each schema object has well-defined results as a function of its keywords (which include various ways of incorporating results from subschemas). $merge
breaks that property, while unevaluatedProperties
does not.
We only found one person who wanted to splice arbitrary things in, and his use case involved the fact that the source from which he wanted to splice was a document that he had no control over as a government contractor due to security regulations. We decided that that was too much of a niche use case to motivate such a powerful feature.
One popular implementation does have an extension for $merge
, and the fact that that is used was cited as a reason to add it. However, that is not, by itself, a valid argument- people use the keywords you give them. I expect people will happily use unevaluatedProperties
for this once they learn it.
The other main use case for $merge
was stuff like this:
{
"title": "Foo",
"allOf": {"$ref": "#/definitions/bar"}
}
where the title
should override any title in the #/definitions/bar
schema. This use case was all about annotations, rather than validation. The tricky part of solving this use case was:
$merge
)So in the end, we formalized how annotations are collected, in order to make it easy to figure out that, if there are two values for title
for a location in the instance, you can determine which value comes from what part of the schema. If you want to take the one that appears outside of a $ref
, you can do that. If you want to combine them somehow, you can do that instead. etc.
So. The problem that OAS faces is that, since your document is composed of many different pieces with different rules for evaluating them, you cannot make use of all of the JSON Schema features that support modularity and re-use. So you need to come up with something else, and I guess that's $mixin
.
If it is used in OAS but not allowed to impact schema objects, then I'm entirely fine with it.
If you decide to make it a feature of the OAS Schema Object, then I might as well give up on converging OAS Schema with JSON Schema. Although I would be open to handing JSON Schema over to this group (assuming the other project editors there agreed). I spent a year on this topic already and have what I consider very good and extensively researched reasons for not including such a thing in JSON Schema, and I have less than zero desire to revisit it.
But the possibility of handing it over is a sincere offer. I am finally (as of this week) making progress on getting draft-08 out the door. I will definitely finish that. I intend to do one more draft as several things need wrapping up. But if the community wants to go in a different direction I would not, at this point, mind being relieved of the responsibility. I can't speak for the other JSON Schema folks, though.
[EDIT: OK that got a little pessimistic at the end there didn't it? Sorry, it's been a rough week in JSON Schema land.]
@handrews , thanks for the detailed background, and sorry to hear that JSON Schema land has been rough. If that's tainting your perspective, maybe give it a little time and reconsider this conclusion in particular:
If you decide to make it a feature of the OAS Schema Object, then I might as well give up on converging OAS Schema with JSON Schema.
I don't think it's our intent to make it specifically a feature of OAS Schema Object. But it might be difficult, or just awkward, to insulate Schema Object from traits.
Could we think of traits (or mixins), and trait application, as a separate layer of processing, similar to a schema generator?
The intent is not to change what's considered a valid schema, a valid parameter, response, etc. The intent is to give users a consistent, generalized way of composing those objects through a purely mechanical (not semantic) and highly flexible form of composition. Validation takes place after traits have been applied, so the resulting objects have to comply with all of the usual validation rules.
This doesn't prevent someone from misusing traits as a replacement for better, more semantically rich and appropriately constrained JSON Schema affordances, like unevaluatedProperties
, allOf
, etc. But we can do our part to discourage this kind of misuse.
I think the challenge for OpenAPI comes down to what you said about JSON Schema:
Fundamentally, JSON Schema has all of the tools that it needs with keywords like
allOf
, etc. for effective modularity and re-use.
The problem is that we cannot say the same thing about OpenAPI. There are lots of odd cases that can be solved by traits, that would require a much bigger investment to solve by more specialized means. This is my personal perspective, and I could try to elaborate, but maybe you and others should have a chance to respond first. I've exceeded my quota of monologues for today.
@tedepstein
Could we think of traits (or mixins), and trait application, as a separate layer of processing, similar to a schema generator?
That possibility for $merge
was discussed extensively. In that case, it was not possible, because of how it interacts with the (necessarily) lazy evaluation of $ref
. There is no way to pre-process all uses of $merge
out.
I have not gone through all of the comments above, so perhaps $mixin
does not have that problem, and is purely a static edit of the file. Perhaps that's what you mean by mechanical rather than semantic. Ultimately, it was not possible to separate the semantic effects of $merge
from the more obvious mechanical manipulations.
If so, then it could be totally separate thing from the JSON Schema spec, and I can tell the inevitable people who show up to demand it to just use it separately.
Though conceptually unifying overlays and traits / mixins sounds desirable, I have, over the last two TSC meetings begun to feel that we have not only been toying with introducing potentially large areas of complexity as @handrews aludes to, which we will struggle to resolve in the putative v3.1 timeframe, but we are drifting further and further from what people are likely (and have actually stated) they want to use traits for (and from here I'm going to separate the terms "trait" and "mixin" - whatever we decide, I would like to avoid the term 'mixin' in the spec, as Open-RPC have just used mixin to mean something else, and APIs themselves often use the term as a method of requesting additional information in the response representation).
As I understand it from the linked issues (some of which have the most positive :+1: reactions of any in the repo), the main driver for traits is to prevent repetition within an OAS document specifically in the areas of request parameters
and response headers
.
An oft-stated case is where someone wanting to describe an API says they wish to add a set of parameters or response headers to "every" operation
. (I know that @webron has said that wherever someone uses the word "every" they mean "most places" and would like some kind of exception mechanism, but I feel we don't necessarily have to accommodate such inconsistencies.) The mixin discussions did not seem to address this global applicability requirement.
What I have not seen much (if any) call for from users is for 'sparse object updates', or the ability to 'mixin' to other areas of the specification, such as request/response schemas
. If this is deemed necessary (and I know that RAML traits work like this), then I feel that making overlays a core part of the OAS specification is the way to go. overlay
objects would live under components/overlays
, and an overlays
array property (at whichever levels we thought was appropriate - top, pathItem
, operation
etc - would apply them.
Then a 'overlay document' just becomes a case (like reusable schema
component libraries) of an OAS document which has no paths
.
If necessary, the target
property of an overlay
object could become a targets
array, to make application to diverse areas of the target document easier.
If we don't feel that overlays handle all cases which traits are required to (i.e. we wish to simply point outwards towards a trait, not towards an overlay which points back inward to areas of the document) then something like a trait
object as explored in this gist might be worth considering.
@handrews I apologize if I have reopened some old wounds. I don't know all the history of $merge in JSON schema, but I do think that OpenAPI needs some more powerful composition mechanisms than it currently possesses (echoing @tedepstein 's sentiments above).
Regarding
It also makes a mess of schema implementation in a number of problematic ways, because it splices arbitrary stuff together.
If that's a problem, we could require that mixins specify their type
(that was in my original proposal, but @darrelmiller suggested we make it optional). Making type
required would mean the mixin content would not be "arbitrary", but in fact well-defined and could be validated.
But I suppose another obvious direction we could take here is to eliminate the current restrictions for allOf
, oneOf
, anyOf
in OAS. If these provide all the necessary mechanics for composition in JSON schema, then we should take a hard look at whether this is also the right solution for OAS.
@mkistler
If that's a problem, we could require that mixins specify their
type
(that was in my original proposal, but @darrelmiller suggested we make it optional). Makingtype
required would mean the mixin content would not be "arbitrary", but in fact well-defined and could be validated.
type
is actually not the issue at all.
Hmmm... I guess I'm just going to have to explain the generalized JSON Schema processing model, as it has developed in order to enable properly supporting non-validation vocabularies such as code, ui, and documentation generation. Seeing as OAS is one of the major motivations behind this (particularly for code and doc gen), it's worth a look anyway.
This is going to be long and someone will no doubt complain about that, but the short version does not seem to be getting across the full complexity of the problem.
It might take me a couple of days to get it written up. I've got about half of it so far but don't have more time to spend on it today.
@mkistler @darrelmiller @MikeRalphson @tedepstein given https://github.com/OAI/OpenAPI-Specification/pull/1865#issuecomment-472953005 it sounds like there's not much point in me writing up why this is such a problem for JSON Schema.
I'm still happy to do so, because I think it is important, but do let me know as it's substantial work to explain it all and I don't want to bother if this is a done deal.
Hi @handrews,
I'm not sure how the comment you referenced changes the situation. But I don't think you should invest a lot of time in this.
Could we think of traits (or mixins), and trait application, as a separate layer of processing, similar to a schema generator?
That possibility for $merge was discussed extensively. In that case, it was not possible, because of how it interacts with the (necessarily) lazy evaluation of $ref. There is no way to pre-process all uses of $merge out.
I think I understand.
$ref
is supposed to be evaluated lazily, because the reference could resolve to a different value at runtime. We'd want that change to be reflected in the schema, and in any validation or other runtime behavior based on that schema. $ref
, so traits are also dynamic. They're not just a design-time mechanism for generating static schemas. $merge
so problematic. Do I have the right idea?
I have not gone through all of the comments above, so perhaps $mixin does not have that problem, and is purely a static edit of the file. Perhaps that's what you mean by mechanical rather than semantic.
When I said that traits are "purely mechanical," that is probably not right. It's more accurate to say that traits would be applied before validation of the resulting, modified OAS document. And while it might be possible for the application of valid traits to a valid schema to produce an invalid schema, problems like that should be evident at design time.
I had not considered lazy evaluation of $refs, so maybe that changes things.
Also, we learned some things from @usarid on today's call:
We agreed on today's call to revisit the use cases driving discussions about Traits and Overlays. We want to make sure we have a representative set of use cases that cover the most common patterns, and determine which of these call for an internal composition feature (like traits) vs. external (like overlays), where information is being added at a different time, or by different parties, from the base document. Some use cases might reasonably call for both.
Once we have that, we should have a better sense of what traits and overlays might do. Until then maybe we don't need to go too deep into the Schema implications.
@tedepstein OK, after a lot of thought, I've distilled this down to a relatively concise explanation.
Part of the problem with $merge
is potentially unexpected behavior as large systems grow and change.
If I have a schema that looks like:
{
"$id": "https://example.com/schemas/foo",
"title": "Foo",
"type": "object",
"properties": {
"specialProp1": {"type": "integer"},
"specialProp2": {"type": "boolean"}
},
"additionalProperties": {"type": "string}
}
I publish this schema as the schema that officially validates Foos.
You decide that you have a FooBar which is pretty close to being a Foo but has one more special property in it. So you $merge
or $mixin
or whatever:
{
"$id": "https://example.com/schemas/foobar",
"title": "FooBar",
"$mixinMergeThing": [
{"$ref": "https://example.com/schemas/foo"},
{
"required": ["specialProp3"],
"properties": {"specialProp3": {"type": "boolean"}}
}
]
}
Because of how properties
and additionalProperties
interact, this has the effect that, if an instance has a property named "specialProp3"
, then to validate as a Foo, it would have to be a string, but to validate as a FooBar, it would have to be a boolean.
With all of the current and planned features of JSON Schema, this is intentionally not possible. If you build on a Foo, then your derived schema MUST satisfy all of the constraints specified by the Foo schema.
But in this example, FooBar is derived from (in the sense of depending on / building on) Foo. But (due to the required
) a valid FooBar is in fact never a valid Foo. You can, in fact, use this sort of keyword to slice things up and produce new schemas that have no clear relationship to the constituent schema.
JSON Schema is a constraint system. A fundamental rule of such a constraint system is that you cannot lift constraints. You can add more, and that is how things are re-used. But you cannot lift them. unevaluatedProperties
lets you do some complex things, but it is still adding constraints.
Once all relative URIs in the schema are resolved (interactions between $id
and $ref
), each schema object's constraints can be evaluated independent of any parent or sibling schemas. First you evaluate all subschemas, and then you evaluate the local keywords.
If you force some sort of $merge
behavior into JSON Schema in the context of OpenAPI, then it is no longer a proper constraint system. While the independent evaluation of objects still technically exists in the form of the lazily evaluated merge results, schema authors cannot see those objects easily. In terms of what you can see, you can no longer trust that your schema object is evaluated independently.
The author of the Foo schema may not have any idea that there is a FooBar that splices their Foo schema. But now, instead of the Foo schema being a properly encapsulated description of valid Foos, it is just a source of keywords that can be rearranged arbitrarily. There is no encapsulation anymore.
I have spent pretty much the entire current draft cycle focused on keeping people from breaking JSON Schema's fundamental constraint and encapsulation design.
All of the work on modular extensibility, keyword classification, and unevaluatedProperties
has been towards that goal. unevaluatedProperties
is obvious, but the rest of it I have done in order to enable users (specifically OAS) to build things like code generation vocabularies out of annotations, and therefore not need $merge
splicing features that ruin the constraint system in order to get the desired results.
That required:
unevaluatedProperties
and similarly dynamic keywords without breaking the fundamental approachIt has been a lot of work, and not just by me. But if OpenAPI decides to allow schema mixins... well, you're probably one of the biggest users of JSON Schema. People who are looking for shortcuts instead of building sustainable systems will demand the mixin feature be added to JSON Schema proper instead of learning all of the things that we did to build a better system.
I realize that not everyone cares about JSON Schema having a consistent, extensible, and elegant underlying model. Although I assert that having such an underlying model would make JSON Schema more successful in the long run as use cases grow and change. I certainly don't expect OpenAPI to consider this property of JSON Schema a goal.
But I hope this makes it clear why I'm not happy with this direction and how it is likely to impact JSON Schema if chosen.
Thanks for the lucid explanation @handrews. I think preserving the integrity of JSON Schema's processing model and composition semantics should be an important design goal for us.
I really cannot say much more without looking more carefully at use cases.
But I do think part of our problem is that we're (still) trying to use JSON Schema as a type definition language. A prototypical use case for mix-ins goes something like, "I want to add these properties to the object schema of the request body." But you're not really adding properties, you're adding constraints, which has a whole different set of implications. And the nature of the "adding" operation needs much more careful thought than we're accustomed to giving it.
I'm opening this issue simply as a place to collect some ideas about how the concepts of Overlays and Traits might be brought together.
In both proposals, I think the key notion is a "fragment", which I would describe as: a "sparse" sub-object of an OpenAPI definition. In the Overlay proposal, a fragment is the
value
of an "Update Object" and has a type ofany
.I think fragments -- which I would like to call "mixins" -- can have a more well-defined structure than just
any
. If we use thediscriminator
approach already present in OpenAPI for "mixins", we can require (and validate) conformance to a particular structure. In particular, we can require a mixin to be a "sparse" form of any well-defined OpenAPI object, e.g. Operation, Response, Parameters, or even the whole OpenAPI definition.Mixins could be defined as just another flavor of "component". So
Note *: "sparse" here means all props are optional
Mixins could then be included effectively anywhere in the API doc by reference:
By virtue of the mixin type, it could be validated as allowed or not allowed at the point it is referenced.
Now Overlays can become simply a mechanism for inserting mixins and mixin references into an API document. The JMESPath mechanism of overlays still provide the ability to apply a single update to multiple target objects using wildcards, but that update would now be expressed as simply adding a "mixin" to each of the target objects.
These are just strawman ideas and I do not claim to have thought them through any detail, but I hope they can serve as useful seeds for discussion.
Examples
Mixins are a recasting of "Traits" as described in OAI/Overlay-Specification#38. Here's how I imagine mixins could be used to apply a "pageable" trait to operations.
The "pageable" mixin would be defined in the
components / mixins
section of the API doc:and an operation would "apply" the "pageable" mixin with a $mixin property, as follows:
The application of the mixin to the operation would yield on operation like: