OAI / sig-moonwalk

Version 4.x of the OpenAPI Specification is known as "Moonwalk," and has a goal to ship in 2024.
https://www.openapis.org/blog/2023/12/06/openapi-moonwalk-2024
Apache License 2.0
278 stars 13 forks source link

Schema/Parameter default value #112

Closed jbee closed 3 months ago

jbee commented 11 months ago

I was surprised to find that the default value of a parameter needs to be expressed as default property on the parameter's schema. This confuses type default with endpoint parameter defaults. The type might not have a default but a parameter does.

Looking at schemas as on-off type descriptors there is no issue. However, usually implementations map the schema from a language type. E.g. mapping Java's Integer as integer. Such mappings would be held in a map source language type => Schema for reuse and identification of "the same type". If now the default, which is endpoint specific, is encoded as part of the schema this relation is broken and one would need to construct a key record with the source language type and the default value as components. This (hopefully) makes it clear that the default value in not a property of the schema but of the parameter and therefore it should be allowed to declared as field of the parameter directly.

Also when searching for the description of the default property of a Schema Object I could not find an actual declaration in any of the 3.x versions. But there are examples using it.

jbee commented 11 months ago

In addition the view that the default can be seen as a property of the parameter's type would be inconsistent with the required property of the parameter. If the former is the case than the required should equally be a contrarian of the type not the parameter itself.

handrews commented 4 months ago

@jbee required is outside of the parameter schema because in JSON Schema, required lives above the level of the individual property. If there were a single overarching schema for parameters (as is being considered for 4.0 over in the OAI/sig-moonwalk repository), this would be more clear. Arguably, default in JSON Schema should behave the same way, and it has in fact been argued that it should change or have a higher-level keyword added.

So while I agree with you that there's a mismatch between the handling of default and required, and that the required handling is more correct, the situation is inherited from JSON Schema. And likely to be handled better in 4.0.

Changing this in 3.x would add work for implementors. Is there anything that the current situation prevents you from doing? If not, I think this is best closed as being more-or-less handled in Moonwalk by using one schema for all parameters:

jbee commented 4 months ago

Changing this in 3.x would add work for implementors.

Well, everything is a question of perspective. Being an implementer myself, let me give you a perspective where providing default on parameters means a lot less work and headache.

Is there anything that the current situation prevents you from doing?

Yes, given the current situation not adding defaults to the generated documents in some cases (even though the information is there) is the best trade-off. Let me elaborate.

One of the main challenges we see with OpenAPI is scale. That is mostly the size of the generated document and the inability of many of the tooling out there to handle large(r) documents. The second scale issue is time, that is the time required to generate the document(s) from source code.

How is this related to defaults on parameters or not? In several ways.

The main way to keep documents small is to avoid restating parts that can be shared. Mostly it means to use schema references as much as possible. Ideally most of the types that aren't just simple "primitives" are modeled as "named" schema types that are referenced throughout the document. And this aligns wonderfully with the semantics of (static) types like classes in the Java language. So the obvious and easy thing to do is to have a 1:1 relation between a Java class and a OpenAPI named schema. However, if one wanted to maintain all defaults stated they would need to go into the schema declaration. This means schemas cannot be reused but would need to be one-offs that restate all the schema details for each occurrence. Many types usually occur hundreds or even thousand of times in larger documents. This means the size of the document increases 1-2 orders of magnitude and given that type analysis is a major part in the source code to OpenAPI mapping the time required to generate equally would see a 1-2 OOM increase. This is not acceptable just to be able to provide the default information so we omit is should it have the consequence that we would need to restate schemas (inline schemas). But doing that requires extra code and eats fair share of the complexity budget for the little gain of sometimes having defaults if that does not interfere with schema reuse.

Lets look a practical example to illustrate the point. A timestamp value is likely to be modeled using Java's Instant. Now users may be allowed to provide the instant in form of a (epoch) number or a date-time string of some sorts. In OpenAPI schema this might look like this:

"Instant":{
        "oneOf":[
        {
          "type":"integer",
          "format":"int64"
        },
        {
          "type":"string",
          "format":"date-time"
        }
        ]
      }

Now, imagine a parameter of such an instant that has a default, say year 2K. If that isn't defined on the parameter there are several problems. The main problem, with such default on the schema it becomes non reusable as other timestamps do not have such default and thus such a declaration would need to be in-lined and thus restate a lot of it just to add the default. The second problem is that such default would be stated in the source in form of the input a user would provide. So a string 2000-01-01 or something like that. Were would that even go now? Just on the "string" oneOf part is not correct. Both would need it but how would the code know how to convert such default to a number that could be added to the "integer" case?

TLDR; This is just a nightmare to deal with. It causes poor performance and/or adds unnecessary complexity and all it would need to be simple and straight forward is to allow defining a default on parameter.

That said a default on schema does have a use case. It is just a entirely different one. Types in a source language, like Java primitives, might have a language defined default. boolean is always false and such. This is something that is true on the type level and thus should be defined on the schema (unless of course a default on a parameter would "overrride" it).

Last but not least let me also point out that defaults and required are tightly connected. A parameter with a default is obviously not required. There should be no need to state the obvious. Because another way to limit size of the generated document is to avoid stating something that would be implicitly assumed, like "required": false.

handrews commented 4 months ago

@jbee thanks for providing all of this detail!

It sounds like, fundamentally, this is a problem with JSON Schema's default being inside the property rather than analogous to required. The case of potentially having a default in the Parameter Object is just a special case of that, just as having required in that object is a special case of supporting required across a set of things that are not all one JSON Schema.

The JSON Schema missing/defaultProperties/propertyDefaults/itemDefaults proposal that I linked to would solve this, and would naturally motivate including that keyword at the Parameter Object level.

You did not see default declared in OAS 3.x because it is inherited from the JSON Schema specification. There might be some commentary on it in 3.0, but in 3.1 we gained full compatibility with JSON Schema and avoided mentioning anything that was sufficiently well-defined in JSON Schema draft 2020-12.

In general, we don't want to be adding more OAS-specific keywords to JSON Schema, although we have not definitively decided to never do it. Of course, you can make your own JSON Schema extension and use it, particularly as there aren't any validation impacts (but there's always the question of tooling support).

I'll tag this for TSC review as it's a good case for deciding if there are circumstances where we'd want to expand our small set of custom schema keywords. The fact that we'd want a companion keyword in the Parameter Object ties it to OAS more clearly than most such proposals. Also, it's unclear that there will be movement on the JSON Schema side anytime soon. If we did add a new keyword (in either the Schema Object or Parameter Object), it would go in 3.2.

jbee commented 4 months ago

My certainly very unpopular opinion is that it is a mistake to base one spec upon another as that by nature removes control from the very thing that should capture the essence of a solution to the original problem. It will push and limit the design in certain directions that might be far from where a good solution is found. Like in this case of the default which is straight forward as a information about the API behavior, yet because OpenAPI brought in the ballast of JSON schema this now becomes something different and difficult to decide and implement and there is even a good chance that the semantics it finally gets are not very helpful for the problem of describing how an API works.

handrews commented 4 months ago

@jbee you will be happy to know that we're actively discussing de-coupling from any specific schema technology in OAS 4 "Moonwalk". I don't think there's a specific discussion for it in the OAI/sig-moonwalk repo right now, but it's mentioned in several discussions related to schemas, and we've been talking about it in the Tuesday morning Moonwalk meetings a lot.

We don't want to make up our own schema technology, but (in my view) we need a better description of the interface between schemas and OpenAPI "proper", which I think would highlight when aspects of the schema system leak out into OpenAPI portions (as with required in the Parameter Object, although that is likely solved in Moonwalk by modeling all parameters with a single schema).

karenetheridge commented 4 months ago

I agree that default belongs as a property of parameter objects, just as required is -- the default in json schema is something else entirely really (and as Henry mentioned above, doesn't really work as intended anyway as it is just an annotation, and annotations are not provided for missing properties).

If default existed as a first-order property of parameter objects (and it could also be added for requestBody and response as well), it should be made clear that it is not to be confused with any default property in json schema (and indeed we can recommend against using that keyword in schemas).

mikekistler commented 4 months ago

At a practical level, can't default be specified separately from the "type" (base schema) and combined with allOf? E.g.

    parameters:
    - name: foo
       in: query
       schema:
         allOf:
         - $ref: '#/components/schemas/FooType'
         - default: 'bar'

    components:
      FooType:
        type: string
jbee commented 4 months ago

@mikekistler are you deliberately trying to sabotage or am I missing a joke here? April 1st has also been :thinking:

mikekistler commented 4 months ago

Apologies if I missed the point here. I was honestly trying to be constructive.

jbee commented 4 months ago

I had hopes that I made a compelling argument why it is wrong to put default on the "type" and why it is a property of the parameter. Putting it on the type but in a different way that is even more unnecessary complex is creative but not helpful :sweat_smile: So I felt compelled to shoot it down before someone says "Lets do that" :joy: Sorry!

mikekistler commented 4 months ago

I guess my confusion here is with the term "type". I know there is a type keyword in JSON schema, but you seem to be using the term to mean something else. Is that right, and if so what do you mean by "type"?

jbee commented 4 months ago

I guess my confusion here is with the term "type"

Because with regards to OpenAPI a schema is the equivalent of a type descriptor. The language we agree upon to describe data structures which in programming languages are modeled by types. Specifically as we do use OpenAPI to represent an API that is implemented in some language (unless that language is untyped or dynamically typed) there should be a 1:1 relation between a type in the source language and the OpenAPI schema we use to communicate said type. As I said earlier it is vital to maintain this 1:1 relation for reuse to keep the size of the resulting documents in check.

Yes, there is much more in JSON schema. That is another reason why it is not a good fit to be used in OpenAPI because most of the features of JSON schema are unnecessary complexity in OpenAPI as no language supports types that would be able to model this. The assumption that an implementation adds a OpenAPI descriptor "on top" of the code is another flawed idea. Anyone that thinks this is feasible has only ever worked with toy examples. In many real systems the sheer surface of the web API is so huge that it would be out of question to maintain a spec that layers on to of the functional code that constantly has to be checked to be "in sync" with what the code actually does and says. The only chance to have a OpenAPI document that is mostly correct is to derive the document from the code directly. And in that process the language types are becoming the schemas so that is why I call them types because that is what they are.

handrews commented 4 months ago

@jbee

Yes, there is much more in JSON schema. That is another reason why it is not a good fit to be used in OpenAPI because most of the features of JSON schema are unnecessary complexity in OpenAPI as no language supports types that would be able to model this.

As arguably the most prolific contributor to the JSON Schema spec in the past decade, I've been trying to argue this point (in agreement with you) for quite some time. I continue to argue the point in the context of OpenAPI 4 (Moonwalk). [EDIT: Although I restrict this argument to the code generation side- there are other runtime things where JSON Schema is a good fit, but it's been conflated with and applied to too many things.]

jbee commented 4 months ago

@handrews

I've been trying to argue this point (in agreement with you) for quite some time

Haha, nice. Arguing for practical simplicity sadly often only finds deaf ears. But I am happy to hold many unpopular opinions :joy: Unfortunately I also tend to express them a bit blunt which isn't always appreciated :sweat_smile:

When it comes to JSON Schema I did implement the validation part in https://github.com/dhis2/json-tree and trying to understand the entirety of JSON Schema just enough to decide where to draw the line was very challenging to say the least.

handrews commented 4 months ago

@OAI/tsc review request:

I think that this boils down to requesting a default field at the Parameter Object level, separate from the Schema Object default. So the question is:

  1. Should we do this in 3.x? (4.0 will change enough that this won't be a direct analogue, and it should be discussed in the OAI/sig-moonwalk repo anwyay)
  2. If so, how to proceed? I suspect this will require writing a proposal for 3.2, because the Parameter Object is already complex, and this would appear to have an interaction with the Schema Object default, which needs to be made very clear

If not, let's close this - there's definitely a real concern and a valid proposed solution, we just need to decide if this is worth putting into 3.x in some way or if we should just consider it holistically in 4.0.

ralfhandl commented 3 months ago

I'd wait until we know how this will look in Moonwalk, then decide whether and how to downport it.

handrews commented 3 months ago

After discussion in the TDC call, I'm going to move this to moonwalk (by moving repos and then converting to a discussion - I'll edit the first comment to add a note explaining that it was filed in the context of 3.x).

@jbee this does not necessarily mean we won't fix it in a later 3.x, but we would rather address it holistically in 4 and then backport whatever makes the most sense.