OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
29k stars 9.07k forks source link

(De-)Serializing null, required, and empty values ins OAS parameters #2037

Closed bodawei closed 4 months ago

bodawei commented 5 years ago

(De-)Serializing null, required, and empty values ins OAS parameters

At a couple points in the last year, I've tried to understand how to serialize and deserialize parameter values based on an OAS definition. The mainstream cases are all very straightforward (which is great). However, I'm trying to write a generalized processor using an arbitrary OAS definition as a guide, and there are "corners" of the spec that are not immediately clear how to interpret.

This document summarizes my understanding. I think OAS is both great and really handy, so I'm hoping that by writing it down in this much detail, it will lead others to point out flaws in my thinking, or guide others who are struggling with the same questions ( e.g. issue #1915 ). Or both!

A detailed summary exists in the "Parameter serialization" section, further below.

Preliminaries

Materials

Ideas

I tend to think of all the data (payloads and parameters) as having a canonical representation in a JSON-compatible structure (by that I mean, a data structure exactly representable by a JSON string). While I don't like positing abstract intermediate systems, I find it helps reasoning about the translation from an application's own data structures to the world that the OAS specification defines, and from that to the final serialization format (JSON, application/x-www-form-urlencoded, RFC6570, etc). In particular, the path from application to the canonical representation is largely outside of the scope of my discussion here. That transformation, itself, is arbitrarily complex and not knowable by anyone but the developers of that application. My discussion here is focused on going from that JSON-compatible representation to a serialized format and back.

empty

A notion that is important to think about here is that of "empty". This emerges from RFC6570, and simply means a string with 0 elements (characters/octets). RFC6570 appears to only work with strings, lists/arrays of strings, or associative arrays (key/value pairs) and undef (discussed below). As such, to understand empty, you must first serialized your values into strings (and the elements of a list to strings, and the values of your key/value pairs into strings). Given the OAS data types, the only values that can be empty are those with type/formats string (with no format), or string/byte, string/binary, string/password.

undef

RFC6570 also has a notion of "undef". This is not explicitly named in the OAS spec, but it seems unambiguous that it plays an important part in parameter (de-)serialization. It is described (in section 2.3 of RFC6570) as:

An expression MAY reference variables that are unknown to the template processor or whose value is set to a special "undefined" value, such as undef or null. Such undefined variables are given special treatment by the expansion process (Section 3.2.1).

Furthermore:

A variable defined as a list value is considered undefined if the list contains zero members. A variable defined as an associative array of (name, value) pairs is considered undefined if the array contains zero members or if all member names in the array are associated with undefined values.

In our JSON-compatible abstract data structure, this means null, [], {}, and { foo: null } are all "undef" for RFC6570 purposes.

While an empty value is serialized as an empty string (resulting in serializing foo and emptyParam as foo=bar&emptyParam=), undef values are treated as if they had never been serialized in the first place (resulting in serializing foo and undefParam as foo=bar).

allowEmptyValues

One of the OAS properties that seems like it applies to these serialization questions is allowEmptyValue. However, the more you look at this, the more confusing (and probably even contradictory) it seems. tedepstein pursued a heroic effort to get clarity about this ( https://github.com/OAI/OpenAPI-Specification/issues/1573 ). In his summary, he wrote:

The meaning of an empty value is application-defined. It may denote an Unassigned Value (very similar or identical to an explicit null value), an Unspecified Value (equivalent to omitting the parameter), a flag (boolean value), or some other meaning as determined by the API provider. (In the case of a flag, it does not take on a different syntactic form, without the equal sign. It's just a different possible interpretation of the same syntax.)

I, personally, find this mildly troubling since it means this "empty" is not the same as the empty from RFC6570. Because this is application dependent ("determined by the API provider"), it isn't clear to me that this actually has any applicability to questions of serialization and deserialization (from and to that JSON-compatible data structure) that I'm worried about here, and instead lives in the translation space from the application's data structures to the JSON-compatible ones.

Given this, and the fact that it is deprecated, my choice is to ignore it entirely in this discussion. This may be a foolhardy decision!

Parameter serialization

With all these preliminaries out of the way, let's look at parameter serialization. The first thing to note is that there are several properties that affect this:

As mentioned above, this is going to ignore allowEmptyValues.

As I've worked through the various cases these allow for, I have come to feel that the two central challenges that need to be accounted for are:

  1. How to represent something like null (especially given that nullable property)
  2. How to assure that the result of deserialization is the same as what was serialized (put another way: how to avoid data corruption).

In practice, the issue of handling empty values usually gets pulled into the discussion. However, this seems to only be a problem because of trying to answer these two questions.

The following sections present a long (long!) discussion of these questions. My summary of all that is, however:

The punchline here is that I do not think it is possible for to write an OAS-based parameter (de-)serialization system which does not cause some kind of data corruption when going from and then back to the aforementioned canonical JSON-compatible data system. The best you can do is to define some constraints on what data you allow to be fed into the system to start with. From an absolute perspective, this is very unfortunate. From a pragmatic and application-specific perspective, this is probably not actually a big problem (you simply need to be clear what your data transformation path is)

Note that in the tables below, <no prop> indicates the case where the property is not only not null, but the property itself doesn't exist in some way (equivalent to a JSON object {} without the property key present). Also, prop: 'a' is representative of any property with any non-empty primitive value (true, 65, 3.13e-5, 2019-10-20, etc).

Query Parameters

There are four styles for query parameter (de-)serialization:

style: form

For primitive values

We might expect these serializations, given the inputs on the top, and the rules on the left:

<no prop> prop: null prop: '' prop: 'a'
required=false
nullable=false
INVALID prop= prop=a
required=true
nullable=false
INVALID INVALID prop= prop=a
required=false
nullable=true
prop= prop= prop=a
required=true
nullable=true
INVALID prop= prop= prop=a

Even with this simple table, we have several problems:

  1. The above table makes it ambiguous, when deserializing, whether a particular entry should be considered null or empty. You can disambiguate the cases where nullable=false, but the others can not.
  2. Going strictly by RFC6570, the null values should be treated as undef. In this case, the null column should be identical to the <no prop> column.
  3. Going strictly by RFC6570, it is therefore ambiguous when deserializing, whether a particular entry represents an absent property or a property with a null value.
    • NOTE: Whether this is even a real case will depend on your interpretation of other factors. For example, if you do consider required and nullable as describing different cases, then it is reasonable to talk about an absent property that is not null. And if you picture parameters as living in a JSON object, then it is completely reasonable to have an absent property. On the other hand, some will certainly find those notions nonsensical, in which case (2) and (3) aren't problems at all, the nullable property has no meaning, and many of the problems listed below are not problems either.

If your property is not an empty-able string, however, then the serialization table looks much less problematic:

<no prop> prop: null prop: true
required=false
nullable=false
INVALID prop=true
required=true
nullable=false
INVALID INVALID prop=true
required=false
nullable=true
prop= prop=true
required=true
nullable=true
INVALID prop= prop=true

This suggests that non-empty-able values are well-handled!

However, if you believe RFC6570 holds sway here, then this is still a problem since the null value should result in the same serialization as the property not being present:

<no prop>
prop: null
prop: '' prop: 'a'
required=false
nullable=false
prop= prop=a
required=true
nullable=false
INVALID prop= prop=a
required=false
nullable=true
prop= prop=a
required=true
nullable=true
INVALID prop= prop=a

At this point, however, one can see that the value of nullable is redundant. We can simply ignore it. This leaves us with a much simpler table:

<no prop>
prop: null
prop: '' prop: 'a'
required=false prop= prop=a
required=true INVALID prop= prop=a

This leaves us with the only data corruption problem that the absence of a property and a property with a null value are conflated (which, again, may not actually be a problem in your world)

If, on the other hand, we wanted to ignore RFC6570 and allow nullable to have meaning, we can achieve this only with a data constraint like declaring that strings that have minLength: 0 are invalid when a property is nullable: true

<no prop> prop: null prop: '' prop: 'a'
required=false
nullable=false
INVALID prop= prop=a
required=true
nullable=false
INVALID INVALID prop= prop=a
required=false
nullable=true
prop= INVALID prop=a
required=true
nullable=true
INVALID prop= INVALID prop=a

For a human, this is much more complex (and actually using an API that supported all these would likely be a bad developer experience), but for a machine it is fine.

For arrays with explode: false

With all the above discussion done, let's move on to the next case of style: form serialization: That of arrays.

We again start off with a somewhat idealized table

<no prop> prop: null prop: [] prop: [null] prop: [''] prop: ['a'] prop: ['a', 'b'] prop: ['a', ''] prop: ['a', null, '']
required=false
nullable=false
INVALID prop= prop= prop= prop=a prop=a,b prop=a, prop=a,
required=true
nullable=false
INVALID INVALID prop= prop= prop= prop=a prop=a,b prop=a, prop=a,
required=false
nullable=true
prop= prop= prop= prop= prop=a prop=a,b prop=a, prop=a,
required=true
nullable=true
INVALID prop= prop= prop= prop= prop=a prop=a,b prop=a, prop=a,

Notes:

  1. This has the same problem, as the primitives table, with an absent property and a null value.
  2. This has a similar problem with an empty array (prop: []). In RFC6570 this is the same as a null value.
  3. While it doesn't explicitly state this, RFC6570 might be taken to imply that an undef value in an array should be treated the same as if that value isn't there. Thus, for example, prop: [null] is identical to prop: [] which is identical to prop: null which is the same as not having the property.
  4. While neither the OAS specification nor RFC6570 seem to address it, both seem to implicitly assume arrays may only contain primitive values.
  5. Having an undef value anywhere in the array causes deserialization ambiguity.

If we take the strict RFC6570 strategy, from the previous section (assuming an undef value means the property is not serialized at all), then we end up with the simpler:

<no prop>
prop: null
prop: []
prop: [null]
prop: [''] prop: ['a'] prop: ['a', 'b'] prop: ['a', '']
prop: ['a', null, '']
required=false prop= prop=a prop=a,b prop=a,
required=true INVALID prop= prop=a prop=a,b prop=a,

This leaves us with only a couple remaining problems:

If, instead, we take the non-strict-RFC6570 stance (which you would do if you want to be able to explicitly represent null), we can eliminate ambiguity if we add several constraints:

Note: This is not the only set of constraints that you can add to avoid ambiguity! For example, an alternate to the third would be "Array elements that are an empty-able string type must have a minLength: 1 if their parent array is nullable, unless that parent array has at least minItems: 2.

<no prop> prop: null prop: [] prop: [null] prop: [''] prop: ['a'] prop: ['a', 'b'] prop: ['a', ''] prop: ['a', '', null]
required=false
nullable=false
INVALID INVALID INVALID prop= prop=a prop=a,b prop=a, INVALID
required=true
nullable=false
INVALID INVALID INVALID INVALID prop= prop=a prop=a,b prop=a, INVALID
required=false
nullable=true
prop= INVALID INVALID INVALID prop=a prop=a,b INVALID INVALID
required=true
nullable=true
INVALID prop= INVALID INVALID INVALID prop=a prop=a,b INVALID INVALID
For arrays with explode: true

We again start off with this full table:

<no prop> prop: null prop: [] prop: [null] prop: [''] prop: ['a'] prop: ['a', 'b'] prop: ['a', ''] prop: ['a', '', null]
required=false
nullable=false
INVALID prop= prop= prop= prop=a prop=a&prop=b prop=a&prop= prop=a&prop=
required=true
nullable=false
INVALID INVALID prop= prop= prop= prop=a prop=a&prop=b prop=a&prop= prop=a&prop=
required=false
nullable=true
prop= prop= prop= prop= prop=a prop=a&prop=b prop=a&prop= prop=a&prop=
required=true
nullable=true
INVALID prop= prop= prop= prop= prop=a prop=a&prop=b prop=a&prop= prop=a&prop=

This results in the same set of drawbacks, and the same sets of ways to resolve this, as the explode: false` case.

For objects with explode: false
<no prop> prop: null prop: {} prop: { p: null } prop: { p : '' } prop: { p: 'a' }
required=false
nullable=false
INVALID prop= prop= prop=p, prop=p,a
required=true
nullable=false
INVALID INVALID prop= prop= prop=p, prop=p,a
required=false
nullable=true
prop= prop= prop= prop=p, prop=p,a
required=true
nullable=true
INVALID prop= prop= prop= prop=p, prop=p,a

By now, you can probably see that if you do a strict RFC6570 interpretation, you'll collapse the first four columns into a single one and thereby be able to ignore the nullable property. Deserializing the absent parm name will be ambiguous, but other than that it is unambiguous.

If you do a non-strict RFC6570 interpretation, then you'll need to add various constraints.

For objects with explode: true
<no prop> prop: null prop: {} prop: { p: null } prop: { p : '' } prop: { p: 'a' }
required=false
nullable=false
INVALID p= p=a
required=true
nullable=false
INVALID INVALID INVALID? INVALID? p= p=a
required=false
nullable=true
p= p= p=a
required=true
nullable=true
INVALID INVALID? INVALID? INVALID? p= p=a

Notes:

If you go with a strict RFC6570 approach (which again ends up making nullable irrelevant), this becomes much easier to interpret:

<no prop>
prop: null
prop: {}
prop: { p: null }
prop: { p : '' } prop: { p: 'a' }
required=false p= p=a
required=true INVALID p= p=a

style: spaceDelimited and style: pipeDelimited

These are much the same as one another.

The specification states that these can only be used with array values. It also implies in its table of renderings that it can only be used with explode: false. On the other hand, that table has many errors in it, so I'm not sure it is a reliable source of information. The presumably non-normative swagger.io page ( https://swagger.io/docs/specification/serialization/ ) suggests that explode: true can be used, in which case this is the same as style: form with explode: true. In that case, see above for that discussion.

So, for the explode: false case:
<no prop> prop: null prop: [] prop: [null] prop: [''] prop: ['a'] prop: ['a', 'b'] prop: ['a', '', null]
required=false
nullable=false
INVALID prop= prop= prop= prop=a prop=a|b prop=a|
required=true
nullable=false
INVALID INVALID prop= prop= prop= prop=a prop=a|b prop=a|
required=false
nullable=true
prop= prop= prop= prop= prop=a prop=a|b prop=a|
required=true
nullable=true
INVALID prop= prop= prop= prop= prop=a prop=a|b prop=a|

These styles do not claim RFC6570 allegiance. Yet, they have the same creation of deserialization ambiguities that other array serializations produce. (Not to mention that having a | or a space in your value is going to lead to data corruption).

To which, by now, you'll not be surprised to note that if we went with the same style as we are doing for the RFC6570 cases above, you'll pull all the ambiguity into the leftmost column, and not need the nullable property. Or you can ignore RFC6570 and use a bunch of constraints to avoid the problems.

style: deepObject

The pattern for deepObject is much the same as for pipeDelimited etc. I'm sure you won't mind not seeing yet another table here.

Cookie Parameters

Cookie parameters only allow style: form. Ultimately, everything said above about query parameters applies here, too.

Header Parameters

Header parameters only allow style: simple. The serialization is slightly different from what we saw with style: form, above, because the property name is not pre-pended.

Primitive values

Note that this may or may not apply to header parameters, since the OAS spec says in one place that this only applies to arrays, and in another says it applies to primitives, arrays and objects.

<no prop> prop: null prop: '' prop: 'a'
required=false
nullable=false
INVALID a
required=true
nullable=false
INVALID INVALID a
required=false
nullable=true
a
required=true
nullable=true
INVALID a

The OAS specification says that empty is n/a in its example table. I'm not sure what to make of that, particularly given (again) the number of errors in that table, and the fact that the only other mention of n/a is for allowEmptyValues which in turn is only for query parameters.

In this case, even if we take a strict RFC6570 interpretation (thereby rendering nullable to be meaningless), we still end up with:

<no prop>
prop: null
prop: '' prop: 'a'
required=false a
required=true INVALID a

Which means that when required is false, we can't distinguish between no property, null and empty.

Both this and the non-strict RFC6570 interpretation require constraints to provide an unambiguous interpretation.

For arrays and objects

Evaluating these are left as an exercise for the reader.

path Parameters

Path parameters have one distinct difference from other parameters. Their required property must be true.

style: simple

Path parameters can be serialized with style: simple, which was discussed above. Because required must be true, this would leave us with:

<no prop> prop: null prop: '' prop: 'a'
required=true
nullable=false
INVALID INVALID a
required=true
nullable=true
INVALID a

Strict RFC6570 interpretation leaves this unambiguous. Non-strict requires constraints to avoid data corruption.

style: label

These are almost the same as style: simple, except that . is used as a delimiter, and when a property has an empty value, it is written as . rather than an empty string.

All the discussion from the style: simple case can be applied here.

style: matrix

These are almost the same as style: form, except that the delimiter is ; rather than &, and when a property has an empty value, it is written as prop rather than prop=.

You can borrow the discussion from any other location to here.

Conclusion

I find it hard to believe you actually read this far. If you did, and have comments, I welcome them!

spacether commented 2 years ago

Can we get some guidance on this? Should we send %00 when serializing null as this post suggests?

From my reading of RFC 6570 that empty list, empty dict and null are all interpreted the same way. Perhaps the best way forward is to omit sending those parameters when those params have empty values like [], {}, and null and then provide a server default. That solution works for query parameters.

When undef is None or [] or {} then having {undef} in a path is rendered as empty string for type=simple. Server side for path usages there is no way to tell apart the values: empty string/ empty list, empty dict / None

About your nullability interpretation nullable: False does not exclude null so your interpretations above are not correct. There is an issue that discusses it if you search for it. nullable True DOES allow in type null.

spacether commented 2 years ago

Some Possible Paths forward here, which would require breaking changes to openapi:

  1. disallow clients from sending (rfc defined) undefined values. Undefined values are: [], {}, null any dict where all values are null (and should probably include a list where all values are null) That looks the allowEmptyValues boolean is similar to that, but empty string is a valid value and could always be deterministically interpreted as empty string if we disallow undefined values.

  2. Require that null always be sent as %00 This would mean that empty [] and {} would be the only use cases that have undefined unclear behavior.

Note: If one really needs the ability to send all of this info (null/empty list/empty dict) in headers etc they could send it with json content-type serialization using the content map.

handrews commented 5 months ago

While it may not be the ideal solution, the guidance in PR #3840, which is basically "define it as a string and have your application pre-format it", is really the only thing I can think of shorting of inventing a whole new standard for stringification.

I don't think anyone has the time/resources to create such a standard, but there are also many reasons to allow the ambiguity. OAS's success is partially due to being able to describe existing APIs, which no doubt handle this in a variety of contradictory ways. I think the best we can do is highlight which RFCs are relevant at which time (see also #3818 and other forthcoming PRs in this area, particularly around percent-encoding and using content vs schema and the equivalents in the Encoding Object) and advise API designers to avoid relying on non-interoperable behavior. That means there's no ability to delegate this to common OAS tooling, but we can't nail everything down, even if we wanted to.

handrews commented 5 months ago

Oh and see also #3812 regarding allowEmptyValue – the linked issue explains more of how we figured out what that was supposed to do. It long pre-dates the RFC6570-based approach to paramters.

handrews commented 4 months ago

PR merged for 3.0.4 and ported to 3.1.1 via PR #3921! This has been addressed by the new Appendices B and C.