Closed pbryan closed 4 months ago
In recent versions of JSON Schema, this is handled by "contentMediaType"
and "contentEncoding"
:
https://tools.ietf.org/html/draft-handrews-json-schema-validation-01#section-8
These concepts have been part of JSON Schema since before OpenAPI, but under various names and at times in the Hyper-Schema spec (despite having nothing to do with hyperlinks)
How do we describe binary data with a non empty schema? Should it be this?
type: string
contentMediaType: image/png
contentEncoding: binary
@spacether binary
is not a valid contentEncoding
value. The encoding keyword is about transferring binary data as non-binary JSON string data. Per the JSON Schema Validation spec:
Possible values indicating base 16, 32, and 64 encodings with several variations are listed in RFC 4648. Additionally, sections 6.7 and 6.8 of RFC 2045 provide encodings used in MIME.
base64
, base64url
, base32
, base32hex
, base16
, hex
identity
, quoted-printable
, and base64
The JSON Schema Validation spec also notes: "As "base64" is defined in both RFCs, the definition from RFC 4648 SHOULD be assumed unless the string is specifically intended for use in a MIME context."
To transfer a binary resource, contentEncoding
should be left out. I really need to go clean up that part of the OAS spec. I wrote it for 3.1, and even I find it confusing now.
@spacether I'm not sure you need a non-empty schema, btw, as the image/png
part should be handled by the content type of the request or response. Unless it's part of a multipart response in which case things are more confusing.
A schema for a binary resource definitely should not have "type": "string"
in OAS 3.1. In OAS 3.0 and earlier, there was stuff with "type": "string"
and format
, but that's not how it works in 3.1.
So I am concerned with defining schemas in a location dependent context in v3.1.0 When empty schema is defined as a value in a key in the content map it means binary is accepted here.
When that schema is there under a json key, it means that all json types are accepted there.
Looking at the content map definition one can do this:
paths:
/fake/uploadDownloadFile:
post:
tags:
- fake
summary: uploads a file and downloads a file using application/octet-stream
description: ''
operationId: uploadDownloadFile
responses:
'200':
description: successful operation
content:
application/octet-stream:
schema:
$ref: '#/components/schemas/AnyTypeSchema'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AnyTypeSchema'
application/octet-stream:
schema:
$ref: '#/components/schemas/AnyTypeSchema'
components:
schemas:
AnyTypeSchema: {}
NotAnyTypeV1:
not: {}
NotAnyTypeV2:
type: []
NotAnyTypeV3:
not:
type:
- integer
- number
- string
- object
- array
- boolean
- "null"
BinaryOnlySchema:
contentMediaType: application/octet-stream
type: []
Schemas can be $ref to other components. So when the ref refers to another location, that other location's schema by itself has no knowledge of the location specific context/meaning of empty schema. It's problematic that the same schema can be used to mean binary is the only data that this stores for application/octet-stream and for application/json it can store str/bool/int/float/dict/list/None. My goal is to have the schema itself describe that binary is allowed in a BinaryOnlySchema component. In BinaryOnlySchema if we only exclude all json schema types, then it is equivalent to NotAnyTypeV1/NotAnyTypeV2. Does that work, or should the presence of contentMediaType hint that binary is allowed here?
This lack of clarity makes it unclear how to implement tooling (code generation) for v3.1.0
It is not location-sensitive. The comment about multipart responses has to do with OpenAPI's Encoding Object, which is an are of considerable complexity outside the scope of JSON Schema.
The context here is openapi. If one location allows ingestion and transmission of binary using AnyType schema and another location allows ingestion of different data with that same schema definition it looks location or maybe media type key sensitive to me.
@spacether
If one location allows ingestion and transmission of binary using AnyType schema and another location allows ingestion of different data with that same schema definition it looks location or maybe media type key sensitive to me.
The AnyType schema literally allows everything. There is nothing strange about it being used in different locations that, through other aspects of OpenAPI, further constrain what is allowed. The schema behaves the same everywhere your example uses it.
Okay then per that logic then binary content can be stored in any empty schema definition. If one does that and and attempts to be send that data as application/json then serialization of that data would fail. Is that what you envision that implementors should do?
Filed https://github.com/OAI/OpenAPI-Specification/issues/3024 for discussion at the meeting tomorrow.
@pbryan I think we should indeed clarify that "format": "binary"
only applies to places where actual binary data is valid (e.g. not within application/json
). Tagging this for 3.0.4 – not relevant to 3.1.1 because the content*
keywords don't have the same problem.
PRs merged for 3.0.4, with analogous PRs merged for 3.1.1 and 3.2.0 - closing!
It would be good to clarify how implementations should handle "format": "binary" when the value expressed in a JSON representation (i.e. not encoded directly in the entity-body).
The choices I see: