"$id" and "$ref" changes in JSON Schema draft 2019-09 and OAS 3.1

APIDevTools / json-schema-ref-parser

Parse, Resolve, and Dereference JSON Schema $ref pointers in Node and browsers

https://apitools.dev/json-schema-ref-parser

MIT License

945 stars 226 forks source link

"$id" and "$ref" changes in JSON Schema draft 2019-09 and OAS 3.1 #145

Open handrews opened 4 years ago

handrews commented 4 years ago

JSON Schema draft 2019-09 (formerly known as draft-08) introduced several changes around $ref and $id. This draft is being adopted by the OpenAPI Specification version 3.1.

The plain-name fragment declaration function of $id was split into a separate keyword, $anchor
- "$id": "#foo" becomes "$anchor": "foo"
- note lack of # as the value of $anchor is just the name
- It is still referenced as "$ref": "#foo" or "$ref": "https://example.com/some-schema#foo"
$id itself MUST resolve to an absolute URI (no fragment)
- Any schema with an $id is therefore a full resource with its own URI, even if embedded in another resource
- An empty JSON Pointer fragment, e.g. https://example.com/some-schema#, is allowed because it is equivalent to not having a fragment
- This is also because of historical usage in older meta-schemas
$ref can now have other keywords beside it
- The result of the $ref keyword is simply the result of the reference schema
- That result is combined with other keyword results in the usual way
- This means that it is not any sort of merge
- {"type": "object", "$ref": "foo"} is equivalent to {"type": "object", "allOf": [{... contents of foo schema ...}]}

Since this library has not previously supported "$id": "#foo", if you only support one form I would suggest "$anchor": "foo" since that is where OpenAPI is headed.

See also https://github.com/OAI/OpenAPI-Specification/issues/2092 for further details on how this fits into OAS 3.1, as obviously OAS Reference Objects outside of Schema Objects don't know about allOf, etc.

Feel free to ask questions on the JSON Schema slack. I will also keep an eye on this issue.

JamesMessinger commented 4 years ago

Thank you so much for all the detailed information! Sounds like I have a lot of work ahead of me, but I like the direction that JSON Schema and OpenAPI are heading 👍

JamesMessinger commented 4 years ago

@handrews - I've started working on implementing JSON Schema 2019-09, and I'd like to know more details about how $ref is supposed to work alongside other keywords. In your comment above you said:

This means that it is not any sort of merge

{"type": "object", "$ref": "foo"} is equivalent to {"type": "object", "allOf": [{... contents of foo schema ...}]}

I have 2 questions about this.

1) Where is this defined in the spec? The closest I can find is Section 7.7.1.1, but it doesn't explicitly state either of the points you mentioned. I was hoping for the spec to explicitly explain how to interpret $ref alongside other keywords, and how to handle edge cases.

2) How should conflicting values be handled? In my experience, nearly everyone who uses $ref alongside other keywords expects the other keywords override the corresponding keywords of the referenced schema. If I understand correctly, interpreting $ref as an allOf doesn't meet this expectation. Instead, both the keywords of the referencing and referenced schema are applied, which can introduce mutually-exclusive conflicts. Here's an example:

{
  "$defs": {
    "field": {
      "properties": {
        "name": { "type": "string" },
        "tabIndex": { "type": "number" },
        "value": { "type": "string" }
    },
    "checkbox": {
      "properties": {
        "$ref": "#/$defs/field/properties",
        "value": { "type": "boolean" }
      }
    }
  }
}

In this example, the author most likely intends for the "checkbox" schema to be interpreted like this:

{
  "properties": {
    "name": { "type": "string" },
    "tabIndex": { "type": "number" },
    "value": { "type": "boolean" }
  }
}

But if I understand you correctly, the spec intends for the "checkbox" schema to be interpreted like this:

{
  "properties": {
    "value": { "type": "boolean" },
    "allOf": [
      {
        "name": { "type": "string" },
        "tabIndex": { "type": "number" },
        "value": { "type": "string" }
      }
    ]
  }
}

Of course, the problem with this interpretation is that the value property has two mutually-exclusive types.

I appreciate any clarification or guidance you can give me. I'm trying to make sure I implement the spec correctly (or as closely as possible) without breaking people's expectations about using $ref to extend and override schemas.

handrews commented 4 years ago

@JamesMessinger thanks for digging into this! I know it poses more of a challenge for your packages than for most implementations that focus on validation.

There are a lot of reasons why $ref is not a merge. One reason is that it is always possible to get the correct end result without a merge by refactoring your schemas.

The other reasons have to do with a lot of work we did to clarify the processing model for applying a schema to an instance, so that people can make reliable extension keywords that can be done as extensions available across implementations with a reasonable expectation that they will work. The lack of broadly usable extensions has been a huge limiting factor on finalizing JSON Schema, as everyone wants their favorite keyword in the standard.

So we needed to be able to say "keyword behaviors are within these boundaries", and supporting merges would have made that much harder. In conforming validators (that handle $refs as they are encountered), the resulting behavior is always like your last example. So that's something we could not change.

Fundamentally, JSON Schema is a constraint system, and you can only ever add constraints when you combine schema objects. You cannot remove constraints. Allowing removal would complicate a lot of things. There are other reasons, and well over 500 GitHub comments on the topic. It was a Huge Thing. In fact it was what nearly killed the project after draft-04.

So practical advice: Since you are (correct me if I'm wrong) preprocessing JSON files with references, rather than applying schemas, you have more options.

I don't know what to do about folks who expect an assertion (like type) to be overridden through $ref. That means that their input and output schemas will have different validation behavior but I guess they want that? I suppose you could keep supporting it as an option.

For annotations like title, the idea is that the application (whoever consumes the applied schema+instance) would decide how to handle multiple annotation values. If you wanted to have an option to override them, that would seem valid. In that sense, the ref parser is more of an application than a validator- it's just one that runs outside of validation.

You could also add your own keyword that allowed overrides, but it would not ever be a JSON Schema keyword and validators would never handle it correctly. I'd recommend calling it %override because the % would highlight that there's something unusual going on. (Also, $merge has epic baggage attached to it that you do not want- the OpenAPI folks are converging OAS 3.1 with the next draft, so they will not be adopting $merge either).

We do expect some challenges as this all gets rolled out to tools that do code generation or other non-instance-based actions, which is why we worked closely with the OpenAPI Technical Steering Committee to see if they thought it would work.

Does any of this help?

Side note: If you were planning to do anything with $recursiveRef, don't. It gets replaced with $dynamicRef in the next draft which actually makes sense- $recursiveRef was so confusing even I couldn't use it correctly and I came up with the damn thing.

JamesMessinger commented 4 years ago

Thanks for the detailed response. It helped me understand the thinking behind the $ref behavior, especially how merging/overriding would break expectations of other parts of JSON Schema. I like your point that it's always possible to get the correct ind result without a merge by refactoring your schemas. That's a good point. Perhaps what I really need to do is provide documentation and examples to my users, explaining how to refactor their schemas to achieve the result they expect, while still complying with the spec.

I really appreciate the tip about $recursiveRef. I was intentionally saving it till last, because it looked so complicated and confusion. It's great to know that I can just ignore it and wait for $dynamicRef to (hopefully 🤞) make things simpler.

handrews commented 4 years ago

@JamesMessinger I'm glad it helped!

Regarding $dynamic*:

$dynamicAnchor creates a plain-name fragment just like $anchor (formerly the #foo form of $id)
plain-name fragments are scoped to a whole resource, so each time you process a schema resource you have to scan it for $anchor and $dynamicAnchor

When referencing a URI including a plain-name fragment, the behaviors are:

$ref to an $anchor fragment: normal URI behavior
$ref to a $dynamicAnchor fragment: normal URI behavior
$dynamicRef to an $anchor fragment: normal URI behavior
$dynamicRef to a $dynamicAnchor fragment: special runtime behavior

In the special runtime behavior case, if there's a $dynamicAnchor with the same name higher in your dynamic scope, then you resolve to that URI instead of the one you would normally resolve to.

In practical terms: Generally you are given a set of schema resources (which may be 1:1 with a set of files or may have multiple resources per file because of $id), you scan all of them for $anchor and $dynamicAnchor, because you need to be able to recognize URIs using those fragments.

This step is not new- the "$id": "#foo" form required the same thing, it just looks like "$anchor": "foo" now.

This is the new part: You want to separately keep track of which resources have $dynamicAnchors, and with what names.

When you evaluate a schema with an instance, you do some sort of depth-first traversal of schema, with the exact path of that traversal, and the depth of cycles, determined by the instance. This is because you can't determine the result of a schema object without evaluating all of its subschemas (or, rather, all of the ones that apply to this particular instance). This depth-first traverse defines your dynamic scope.

So as you do that traversal, when you descend into a new schema resource, if that resource has a $dynamicAnchor that isn't already present in your dynamic scope, remember it. For as long as it is in your dynamic scope, if a $dynamicRef points to a $dynamicAnchor of that name in any resource, instead of resolving where it normally would, you resolve to this "remembered" URI.

So basically it lets you substitute the same fragment name on a different resource at runtime. This is really useful in recursive meta-schemas. It is not all that useful in other scenarios.

When you are done with the resource where you first saw that $dynamicAnchor (your traverse is about to traverse to a parent in some other resource), then you "forget" that $dynamicAnchor as it is no longer in your dynamic scope.

If you $dynamicRef a $dynamicAnchor and there isn't any other $dynamicAnchor higher in the dynamic scope with the same name, then the reference behaves normally. (Or, it still substitutes with the first resource in the dynamic scope, but that's the same resource so it doesn't matter 😆 whichever seems more sensible to you!)

Hopefully this will be really clear in the next draft. I think most of the wording is in, but I'm not sure (it's been a lot of start-stop work with the pandemic and whatnot). But that's a quick overview.

philsturgeon commented 3 years ago

Interesting comments on #22 related to URIs and $id https://github.com/APIDevTools/json-schema-ref-parser/issues/22#issuecomment-412013980

ajv has this sort of functionality.

You can add any number of schemas to the instance of the validator after specifying the subject schema, and then references will resolve based on the $id of other schemas that have been added, before making any HTTP calls.

The use of $id and $ref in the above schema is not exactly how it's supposed to work. The $id must be a URI, and those $ids are not.

URI resolution is HARD to understand! I've tried. It's easier to think about in terms of a link on a web page. If the URL of the page you are on X, and a link on that page is Y, how does that apply? It's the same with $ref resolution in a schema file.

What's needed to resolve this issue?
* [ ]  There needs to be an interface that allows the user to add JSON Schema files to an index to be used later.
  Preferably, "load in all schemas in this directory and descendant directories"; I have 4 levels of schemas in a spec I've created.

* [ ]  Any combining or bundling needs to be aware of $id resetting the base URI for that schema.
  I don't know how this is currenly handled, but you are already able to make HTTP requests to resolve schemas, and this could be many layers deep. I'm assuming this already works correctly. I've not checked the tests (I should do that).

* [ ]  When attempting to transclude a referenced schema, look up the URL to see if it's already included in the index, and if so, use that JSON Schema in stead of making an HTTP request.
Based on the above and some initial digging, the code for loading and storing an index of JSON Schema files could PROBABLY be lifted from ajv, and resolution COULD be achieved by creating a new resolver which is always run first if additional files have been loaded this way.

This also applies schemas which are defined with URNs. Uncommon, but in use.

If you want a sample set of schemas to work with, feel free to use these.

A note on termonology: Strictly speaking, dereferencing is the way in which you determin what a reference is refurring to, because the process involves URI resolution.

Including one schema in another schema is called "transclusion".

Helpful distinction when you're trying to talk about two similar but subtily different events.

Combining those issues so we can get $id working for JSON Schema 2019-09 and beyond.

JamesMessinger commented 3 years ago

@philsturgeon - This functionality is implemented (for both Draft 4 and 2019-09) in JSON Schema Reader, or, rather, in the internal library that it uses under the hood.

The key is that these libraries distinguish between files and resources. A file is a physical file on disk or a URL that has been downloaded, whereas a resource URI identifies a single JSON Schema which may or may not correlate to a single file. This is all documented here.

NOTE: Apologies to anyone who is clicking the above links and getting 404s. Those repos are still WIP and haven't been made public yet

Relequestual commented 3 years ago

clicks links reads rest of comment Oh 😅

ekzobrain commented 5 months ago

Hi. Are there any news on this? With the latest lib version and 2020-12 schemas $ref is resolved falsely. This schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "properties": {
    "native_prop": {}
  },
  "$ref": "#/$defs/def",
  "$defs": {
     "def": {
      "properties": {
         "refed_prop": {}
      }
    }
  }
}

Is dereferenced to:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "properties": {
    "native_prop": {}
  },
  "$defs": {
     "def": {
      "properties": {
         "refed_prop": {}
      }
    }
  }
}

While should be:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "properties": {
    "native_prop": {}
  },
  "allOf": [
     {
      "properties": {
         "refed_prop": {}
      }
    }
  ],
  "$defs": {
     "def": {
      "properties": {
         "refed_prop": {}
      }
    }
  }
}

So $ref should be just replaced with allOf (if schema version is 2019-09 or 2020-12) to conform to the spec.

philsturgeon commented 5 months ago

Nope, this is not being worked on. As mentioned elsewhere this tool has a misleading name, and is essentially "oas3.0-ref-parser" so its really not geared up for that, and literally nobody involved in maintaining this software has time to rewrite it to support that.

Scalar's openapi-parser is promising for OAS3.1 users and onwards, but we are not maintaining a generic JSON Schema tool here.

If you'd like to lead the charge on https://github.com/APIDevTools/json-schema-reader/ please let me know and I can set you up.