OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
28.83k stars 9.07k forks source link

Clarify: "type": "string", "format": "binary" in non-entity-body #1544

Closed pbryan closed 4 months ago

pbryan commented 6 years ago

It would be good to clarify how implementations should handle "format": "binary" when the value expressed in a JSON representation (i.e. not encoded directly in the entity-body).

The choices I see:

  1. Interpret as "byte" (i.e. expect it to be base64-encoded).
  2. Prohibit "binary" format in JSON representations.
handrews commented 6 years ago

In recent versions of JSON Schema, this is handled by "contentMediaType" and "contentEncoding":

https://tools.ietf.org/html/draft-handrews-json-schema-validation-01#section-8

These concepts have been part of JSON Schema since before OpenAPI, but under various names and at times in the Hyper-Schema spec (despite having nothing to do with hyperlinks)

spacether commented 2 years ago

How do we describe binary data with a non empty schema? Should it be this?

            type: string
            contentMediaType: image/png
            contentEncoding: binary
handrews commented 2 years ago

@spacether binary is not a valid contentEncoding value. The encoding keyword is about transferring binary data as non-binary JSON string data. Per the JSON Schema Validation spec:

Possible values indicating base 16, 32, and 64 encodings with several variations are listed in RFC 4648. Additionally, sections 6.7 and 6.8 of RFC 2045 provide encodings used in MIME.

The JSON Schema Validation spec also notes: "As "base64" is defined in both RFCs, the definition from RFC 4648 SHOULD be assumed unless the string is specifically intended for use in a MIME context."

To transfer a binary resource, contentEncoding should be left out. I really need to go clean up that part of the OAS spec. I wrote it for 3.1, and even I find it confusing now.

handrews commented 2 years ago

@spacether I'm not sure you need a non-empty schema, btw, as the image/png part should be handled by the content type of the request or response. Unless it's part of a multipart response in which case things are more confusing.

A schema for a binary resource definitely should not have "type": "string" in OAS 3.1. In OAS 3.0 and earlier, there was stuff with "type": "string" and format, but that's not how it works in 3.1.

spacether commented 2 years ago

So I am concerned with defining schemas in a location dependent context in v3.1.0 When empty schema is defined as a value in a key in the content map it means binary is accepted here.

When that schema is there under a json key, it means that all json types are accepted there.

Looking at the content map definition one can do this:

paths:
  /fake/uploadDownloadFile:
    post:
      tags:
        - fake
      summary: uploads a file and downloads a file using application/octet-stream
      description: ''
      operationId: uploadDownloadFile
      responses:
        '200':
          description: successful operation
          content:
            application/octet-stream:
              schema:
                $ref: '#/components/schemas/AnyTypeSchema'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/AnyTypeSchema'
          application/octet-stream:
            schema:
              $ref: '#/components/schemas/AnyTypeSchema'
components:
  schemas:
    AnyTypeSchema: {}
    NotAnyTypeV1:
        not: {}
    NotAnyTypeV2:
        type: []
    NotAnyTypeV3:
        not:
          type:
            - integer
            - number
            - string
            - object
            - array
            - boolean
            - "null"
    BinaryOnlySchema:
        contentMediaType: application/octet-stream
        type: []

Schemas can be $ref to other components. So when the ref refers to another location, that other location's schema by itself has no knowledge of the location specific context/meaning of empty schema. It's problematic that the same schema can be used to mean binary is the only data that this stores for application/octet-stream and for application/json it can store str/bool/int/float/dict/list/None. My goal is to have the schema itself describe that binary is allowed in a BinaryOnlySchema component. In BinaryOnlySchema if we only exclude all json schema types, then it is equivalent to NotAnyTypeV1/NotAnyTypeV2. Does that work, or should the presence of contentMediaType hint that binary is allowed here?

This lack of clarity makes it unclear how to implement tooling (code generation) for v3.1.0

handrews commented 2 years ago

It is not location-sensitive. The comment about multipart responses has to do with OpenAPI's Encoding Object, which is an are of considerable complexity outside the scope of JSON Schema.

spacether commented 2 years ago

The context here is openapi. If one location allows ingestion and transmission of binary using AnyType schema and another location allows ingestion of different data with that same schema definition it looks location or maybe media type key sensitive to me.

handrews commented 2 years ago

@spacether

If one location allows ingestion and transmission of binary using AnyType schema and another location allows ingestion of different data with that same schema definition it looks location or maybe media type key sensitive to me.

The AnyType schema literally allows everything. There is nothing strange about it being used in different locations that, through other aspects of OpenAPI, further constrain what is allowed. The schema behaves the same everywhere your example uses it.

spacether commented 2 years ago

Okay then per that logic then binary content can be stored in any empty schema definition. If one does that and and attempts to be send that data as application/json then serialization of that data would fail. Is that what you envision that implementors should do?

spacether commented 2 years ago

Filed https://github.com/OAI/OpenAPI-Specification/issues/3024 for discussion at the meeting tomorrow.

handrews commented 1 year ago

@pbryan I think we should indeed clarify that "format": "binary" only applies to places where actual binary data is valid (e.g. not within application/json). Tagging this for 3.0.4 – not relevant to 3.1.1 because the content* keywords don't have the same problem.

handrews commented 4 months ago

PRs merged for 3.0.4, with analogous PRs merged for 3.1.1 and 3.2.0 - closing!