OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
28.86k stars 9.07k forks source link

OpenAPI 3.0 does not support csv-serialized form-data arrays #2018

Closed mkistler closed 4 years ago

mkistler commented 5 years ago

The OpenAPI 2.0 behavior

In OpenAPI 2.0, a parameter defined as in: form with type: array defaulted to collectionFormat: csv — meaning that the values of the array should be joined into a comma-separated string and passed in a single form data element ref.

The OpenAPI 3.0 behavior

In OpenAPI 3.0, the default for arrays in a request body appears to be the equivalent of collectionFormat: multi — pass each element of the array in its own form data element. This was hard to track down but here's my step-by-step reading of the spec:

Details -- feel free to skip

In OpenAPI 3.0, form parameters are now described as properties in a request body.

An encoding attribute can be used to specify the serialization of parts of multipart request bodies.

Within the encoding object is a style property which

Describes how a specific property value will be serialized depending on its type. See Parameter Object for details on the style property. The behavior follows the same values as query parameters, including default values. This property SHALL be ignored if the request body media type is not application/x-www-form-urlencoded.

So when the media type is other than application/x-www-form-urlencoded, the serialization will be handled using the default style for query parameters, which is 'form' ref.

In Style Values, style of form means:

Form style parameters defined by RFC6570. This option replaces collectionFormat with a csv (when explode is false) or multi (when explode is true) value from OpenAPI 2.0.

So what is explode? Back to the encoding object:


So when media type is other than application/x-www-form-urlencoded, all properties of the request body are serialized as style: form, explode: true -- the equivalent of collectionFormat: multi in OpenAPI 2.0.

Conclusion

So the default value changed. That's not great but would probably be acceptable in a new major release of the spec, except that there appears to be no way to specify the OpenAPI 2.0 default behavior (collectionFormat: csv ) for properties inmultipart request bodies in OpenAPI 3.0. Both the style and explode properties in encoding are explicitly restricted to media type application/x-www-form-urlencoded, so only the default serialization -- style: form, explode: true is possible.

mkistler commented 5 years ago

Here is my high-level proposal for how to fix this issue.

I think it is highly desirable to fix this in a point release of 3.0, so it must be done compatibly. And I think that means that

In light of this, I think we have to use new keywords to specify this behavior. For example, we could add two new keywords to the encoding object, multipart-style and multipart-explode, that have the same basic meaning as style and explode but are ignored if the if the request body media type is not multipart. This would effectively eliminate the ignoring of style and explode for multipart request bodies, but in a compatible way.

I'll be the first to admit that this is not a pretty solution, and I'd be happy for suggestions on better names than multipart-style and multipart-explode, but I believe this corrects the basic problem in the spec and does it in a compatible way.

Feedback welcome!

mkistler commented 4 years ago

A little more context: An example of this issue is the classify operation of IBM Watson Visual Recognition service. The OpenAPI 2.0 version of VR's API doc, classify accepts a form parameter:

    {
      "name": "classifier_ids",
      "in": "formData",
      "description": "Which classifiers to apply.",
      "required": false,
      "type": "array",
      "items": {
        "type": "string"
      }
    }

In OpenAPI 2.0, because no "collectionFormat" is specified, the default is to encode this as a csv string.

When we updated our API docs to OpenAPI 3.0, this form parameter became a property in the "multipart/form-data" request body:

      "classifier_ids": {
        "description": "Which classifiers to apply.",
        "type": "array",
        "items": {
          "type": "string"
        }
      }   

By the reasoning explained in the issue description, this property defaults to style=form and explode=true, and these characteristics cannot be changed in the encoding object because the media type is not application/x-www-form-urlencoded.

mkistler commented 4 years ago

Based on the discussion in the 10/10 OpenAPI TSC meeting, there may be more flexibility in how we address this issue than I previously stated. So here's my "ideal" solution to this issue:

Remove the sentence "This property SHALL be ignored if the request body media type is not application/x-www-form-urlencoded" from both the style and explode properties of the encoding object.

I understand that these sentences were originally added because of concern for how style and explode might be interpreted for request bodies of other media types like application/json, but the spec already states:

The encoding attribute ... is only applicable to multipart and application/x-www-form-urlencoded request bodies.

Removing those qualifying sentences will make it possible to define array properties in multipart/form-data request bodies as explode: false -- the equivalent of the old (and default) OpenAPI 2.0 collectionFormat: csv.

mkistler commented 4 years ago

After the TSC meeting last week, @webron reached out to me on Slack regarding this issue.

@webron believes (apologies if I captured this incorrectly) that the OAS 3.0.x spec does not specify how arrays should be serialized in multipart request bodies.

In particular, the statements

This property SHALL be ignored if the request body media type is not application/x-www-form-urlencoded.

in the description of style and explode in the encoding object (and allowReserved for that matter) really mean:

This property has no effect on the encoding of properties if the request body media type is not application/x-www-form-urlencoded.

Further, the encoding object is intended to be consistent with the way parameters are defined, using either a schema or a content object:

A parameter MUST contain either a schema property, or a content property, but not both.

Note that a query parameter for passing an array of strings can be defined as:

in: query
name: collection_ids
content:
  text/plain:
    schema:
      type: array
      items:
        type: string

but when defined this way there is no way to specify how the array is serialized.

So array parameters defined with a content object have the same problem as array properties in multipart request bodies in that the serialization style cannot be specified.

@webron believes that a proper fix would address both these problems in a consistent way.

mkistler commented 4 years ago

Not sure how to engage others in this discussion, so I guess I will just continue my monologue.

I want to point out that the contentType attribute in the encoding object does not state that it is mutually exclusive with style\ explode, so this is NOT like parameters where only one or the other may be used. In other words, contentType and style/explode must be compatible for application/x-www-form-urlencoded request bodies. I see no reason why they cannot also be compatible for multipart/form-data request bodies.

I failed to mention in my previous post that @webron believes that for an array property, contentType applies to the entire array, and not the individual items of the array. At the very least, we should clarify the spec on this detail.

I contend that @webron’s interpretation is inconsistent with the way the default values are described:

for array – the default is defined based on the inner type.

Further, if we adopted @webron’s interpretation, I don’t know this would work in practice. Consider a request body containing an array whose individual items are of contentType image/jpeg. You could not describe the entire array as being of image/jpeg, since the array contains multiple images and image/jpeg describes a single image.

So I believe our clarification should be that contentType specifies the content type of the individual items when a property is an array.

And if we make that clarification, they why not allow the explode attribute to be specified for any request body type that can have an encoding object?

handrews commented 4 years ago

@mkistler I'm kind of parachuting into the middle here, but isn't part of the point of multipart/form-data that each part has its own subsidiary media type? Or am I confusing it with something else?

mkistler commented 4 years ago

@handrews Thank you for joining the discussion!!

Agreed that a key feature of multipart/form-data is the ability to specify a media-type with each form part. What we still need to determine is what data goes into what form parts.

In OpenAPI 2.0, the default behavior was to concatenate array items into a csv, all in one form part. In OpenAPI 3.0, the behavior is either unspecified, if you accept @webron's interpretation, or that each item is passed in a separate form part with no ability to specify csv concatenation into a single form part, using the interpretation I gave at the top of this issue.

darrelmiller commented 4 years ago

Following conversations in the TSC meeting, we are leaning towards allowing the style, explode and allowReserved properties in the encoding object to be used for multipart/form-data. multipart-form data is just a different serialization of url-encoded form and therefore should support the same set of behaviors.

webron commented 4 years ago

Closing as #2066 was merged! 🎉