[OAS 3.1.0] Schema changes

OAS 3.1.0 changes from supporting most of an older JSON schema draft to supporting the whole of the JSON Schema 2020-12 draft Core and Validation.

We need to work out:

The changes needed to our Schema model class to support the 3.1.0 schema
Any changes we want to make to our @Schema and related annotations in order to allow users to take advantage of the new functionality that is available
The changes needed to the TCK to expect the new format of schema
- Existing tests will need changes where JSON schema has made changes to properties
- Any new functionality will need new tests

Tasks:

[x] Update Schema model
[x] Update existing TCKs to expect schemas in the new format
[x] Write new TCKs which use the new parts of the schema model
[x] Write new TCKs which test the extensibility of the schema model, particularly in passing through a completely custom schema
[x] Write new TCKs which test setting custom properties with the same name as properties set by other getters and setters
[x] Write new TCKs which test setting custom properties to all of the supported types
[x] Update the @Schema annotation to expose new and changed parts of the schema model #601
[x] Update TCKs to cover new @Schema attributes #601
[x] Check whether we've fixed #567

Here are my initial thoughts:

At a high level, the new JSON schema is very flexible. The core spec defines the idea of a "vocabulary" - a set of properties which have a documented meaning in a JSON schema - and a "dialect" - a set of vocabularies which are supported.

The core spec defines a core vocabulary, and two vocabularies for applying subschemas. The validation spec defines vocabularies for validating the structure and contents of a JSON document, a vocabulary for annotating a schema with metadata like a description and a dialect which includes all the required vocabularies of the core and validation specs.

The OpenAPI spec then defines its own vocabulary containing some extensions, and a dialect which requires the required JSON schema vocabularies and the OpenAPI vocabulary.

By default, all schemas use the dialect defined by OpenAPI, but they can also choose to use a different dialect by using the $schema property in their schema, or by setting jsonSchemaDialect at the top level which sets the default dialect for all schema objects in the document.

In theory, a user can write their OpenAPI document using any dialect of JSON schema they like, as long as they declare it in the document. If we want to be able to read any OpenAPI document the user might package in their application, our model would need to handle any arbitrary JSON document as a schema.

Here are the things that I think have changed in the schema between 3.0 and 3.1 that would need to be reflected in our model:

New fields (core)

$schema (string) - identifies the dialect in use for a schema
$comment (string)
if, then, else (schema) - if the object validates against the if schema, then it must also validate against the then schema, otherwise it must validate against the else schema
dependentSchemas (object(propertyname -> schema)) - if the object has a property with the given name, then the object must validate against the schema
prefixItems (array[schema]) - if the object is an array, its the first item must validate against the first schema in prefixItems, the second must validate against the second schema etc.
contains (schema) - if the object is an array, at least one item in the array must match the schema
patternProperties - (object(regex -> schema)) - if a property name matches the regex, then the property value must validate against the schema
propertyNames - (schema) - each property name in the object must validate against the schema
unevaluatedItems - (schema) - each array item not matched by prefixItems, items or contains must validate against the schema
unevaluatedProperties - (schema) - each property value not matched by properties, patternProperties or additionalProperties must match the schema
- This is very similar to additionalProperties, but unevaluatedProperties won't check properties which have been matched by a subschema applied with allOf, oneOf, then, else etc. whereas additionalProperties will.

Changed fields (core)

$ref - not mutually exclusive with other properties
additionalProperties - must now be a schema, previously a boolean was also allowed (though a schema itself is now allowed to be a boolean)
exclusiveMinimum, exclusiveMaximum - now numbers, previously booleans
readOnly, writeOnly - now valid anywhere, previously only valid where the schema describes an object property

New fields (validation)

const - specifies that the object must have a specific value
maxContains, minContains - (integer) specifies that contains must match between minContains and maxContains items in the array
dependentRequired - (object(string -> array[string])) describes property names that if they are present, certain other property names must also be present
contentEncoding - (string) specifies that a string represents encoded binary data with the given encoding type (e.g. base64)
contentMediaType - (string) specifies that the content of a string has the given media type
contentSchema- (schema) if contentMediaType is a media type that maps into JSON Schema's data model, this property specifies a schema that the data in the string must conform to
- The spec includes an example of specifying that a string must be a base64 encoded JWT token with certain claims

Changed fields (validation)

type - previously a string, now may also be an array of strings. "null" is now a valid value here
example -> examples - more than one example is now allowed

Removed fields

nullable - now expressed by including "null" in the array of valid types

Additional

Boolean true or false values are now valid as schemas.
- We'll have to work out some way to represent this in our model

The following fields are new in the core schema, but I'm not sure if they're relevant to its use in OpenAPI:

$id - the canonical URI of a schema
- Generally you would use this in the root object of a json schema document to tell people how to reference that document
- It is however valid within a document, where it "indicates that the subschema constitutes a distinct schema resource within a single schema document."
$anchor - this is used to name an element in a schema and reference it with this name using $ref elsewhere
- I'm not sure whether/how this should work where the schema is within an OpenAPI document
$defs - allows for defining parts of a schema for re-use later
- similar to how we can reference schemas from under components
- intended to assist reuse within large schema documents
- unsure if this makes sense within OpenAPI where each schema document only describes one object and we do reuse through components
$dynamicAnchor, $dynamicRef - allows for schemas split across different documents to extend each other
- Allows complicated things, the spec has an example of its intended use
- This didn't immediately seem relevant to OpenAPI documents, but I guess it could be if they reference an external schema?

I don't think users of OpenAPI are likely to want to use these fields, but we need to accommodate them anyway since they're valid so we must be able to read them from a user-supplied document.

However, since OpenAPI permits arbitrary dialects, our model may need to allow arbitrary JSON as the schema anyway. If we include a mechanism to allow arbitrary additional properties, we could say that these properties can only be set through that mechanism.

A few things I noticed while trying to implement this for smallrye:

The semantics of nullable vs. having null in the list of types is subtly different. While nullable can betrue, false or unset, null is either in the list of types or it isn't. Also, nullable = true has no effect if type is not set. Whereas you could previously do this for an optional field:
```
{
    "nullable": "true",
    "allOf": {
        "$ref": "..."
    }
}
```
With the new schema this requires an anyOf.
The Extensible interface is now fairly clear throughout that it only works on properties beginning x-. addExtension can be interpreted as adding x- to the start of any key which doesn't already have that prefix. Having Extensible work this way allows for consistent handling of types which support extensions, so I think we should leave it as it is and create a new interface for "freeform" objects.

Wouldn't having null absent from the type array be semantically equivalent to nullable: false or undefined/unset?

For the second issue, I think omitting type entirely implies any type, including null. Basically, the value is unconstrained.

Yes, you can have a semanitcally equivalent end result, but it makes trying to keep existing code which uses the interface working difficult.

At the moment, I've deprecated setNullable and the single argument setType, thinking that they could both be implemented by manipulating the list of types.

However, you can't quite do that consistently. If the user calls setNullable but never calls setType, you don't want to end up with "type": ["null"] in the schema, since that forbids anything that's not null. However if they do call setType(OBJECT), you do want "type": ["null", "object"] so would need to store a flag somewhere to say that nullable has been requested.

Whether that's an issue or not depends on how you implement freeform objects. I tried making the Schema implementation a thin wrapper around a JSON object, but then you have nowhere to store data except within the JSON.

eclipse / microprofile-open-api