OAI / Overlay-Specification

The OAI Overlay Specification
Apache License 2.0
60 stars 15 forks source link

A Canonical and Extended Forms for OpenAPI Specifications #37

Open pjmolina opened 5 years ago

pjmolina commented 5 years ago

A Canonical and Extended Forms for OpenAPI Specifications

Following the discussion on last TSC and talking about Traits, Overlays and/or Mixings, anyone can agreed these features are strongly oriented for extensibility.

To keep things as simple as possible for implementers, I want to propose the idea of having a two levels of the specification that can be define as follows:

  1. Let's call Extended Form version of OpenAPI documents to the ones containing Traits, Overlays, Mixing, $refs, or any other macro-like indirection functionality (quite useful to avoid repetition and be as concise as possible).
  2. On the contrary, let's call Canonical Form an OpenAPI document with all these features fully resolved. All indirections coming from overlays, traits, mixings and $refs are resolved to have a single tree in a single file/document.

This will allow tools implementers to focus on:

This approach enable to:

This can help tools implementers to embrace OpenAPI 3.0 faster when targeting (2) or (1).

Divide and conquer strategy, that's it. What do you think?

tedepstein commented 5 years ago

@pjmolina , I think it's a good idea. We have already discussed the idea of making overlays a separate specification, and I think this is the prevailing direction of the TSC.

But even today, we find that there are code generators, documentation formats, test consoles, etc. that do not correctly handle some features of OpenAPI 2.0 and 3.0. Common stumbling blocks include:

We have our own KaiZen OpenAPI Normalizer to smooth out these problems for reliable downstream processing. It's not a trivial operation, and functionality like this is only going to get more important as we start adding traits, overlays, alternative schemas, and other features.

Having Extended and Canonical forms more clearly defines the role that tools like Normalizer can fulfill, as translators from Extended to Canonical form. And it removes a significant barrier to adoption of new OpenAPI versions.

Most OpenAPI usage is read-only. Consumers of OpenAPI only need to read and comprehend the API document; they won't care about how it has been composed internally. If we can separate roles, so that OpenAPI consumers don't have to be responsible for piecing together the API description from its constituent parts, I think that would be a big win.

pjmolina commented 5 years ago

You nailed it @tedepstein ! It looks like we have experience the same kind of pain. ;-)

MikeRalphson commented 5 years ago

How will the canonical form represent circular schemas in a document if all $refs are to have been resolved? JSON has no mechanism to support this, and we ban the use of the related YAML features.

pjmolina commented 5 years ago

Fair point @MikeRalphson :

Example: Recursive and circular references in Schema Types.

Any other uses cases where circular refs could be a problem?

tedepstein commented 5 years ago

@pjmolina , @MikeRalphson , here are some excerpts from the Normalizer docs that explain how this works:

When the normalizer encounters any reference, there are two ways it may process the reference:

Inline The normalizer retrieves the referenced value (e.g. the Pet schema definition object) and replaces the reference itself with that value.

Localize The normalizer first adds the referenced object to the normalized spec that it is creating, if it is not already present, and then replaces the reference with a local reference to that object. So in the external reference example shown above, the Pet schema definition would appear directly in the OpenAPI spec produced by the normalizer, and references that were formerly external references would become local references.

(snip)

Recursive References It is possible to set up recursive schema definitions in OpenAPI specs, through the use of references. For example, consider the following schema:

      matriarch:
        $ref: "#/components/schemas/Person"
...

components:
  schemas:
    Person:
      type: object
      properties:
        name:
          type: string
        children:
           $ref: "#/components/schemas/People"  
    People:
      type: array
      items:
        $ref: "#/components/schemas/Person"

The Person schema has a children property of type People, and the People schema defines an array of Person objects.

Naively attempting to inline a reference to a Person object would lead to a never-ending expansion...

To handle recursive references encountered during inlining, the normalizer stops inlining whenever a reference is encountered that is fully contained within another (inlined) instance of the referenced object. That recursive reference is localized rather than being inlined.

In the above example, we would end up with something like this:

partially-inlined

    matriarch:
      type: object                                      
        properties:
      name:
        type: string
      children:
        type: array
          items:
            $ref: "#/components/schemas/Person"        
...

components:
  schemas:
    Person:
      type: object
      properties:
        name:
          type: string
        children:
          type: array
          items:
            $ref: "#/components/schemas/Person"
...

Here we see:

  • that the top-level reference to Person as the type of the matriarchproperty was inlined;
  • that the recursive reference to Person encountered while performing this inlining has been localized;
  • that the Person schema itself was subjected to inlining, with localization of its recursive reference;

There are other details of the algorithm for handling name clashes. There's also a somewhat misguided distinction between "conforming" vs. "non-conforming" references, which we're planning to eliminate in a future revision. So I would not propose the KaiZen Normalizer documentation, in its current form, as a baseline spec for Canonical Form.

But depending on our goals for Canonical Form, we may not need to specify the algorithm to this level of detail. Maybe it's sufficient to say that Canonical Form just means:

  1. There are no external references, traits or overlays.
  2. All cascading properties have been expanded down to their respective leaf levels.
  3. All default values are explicitly specified.

Different processors could accomplish this in different ways, and Canonical Form does not guarantee that the output will always be exactly the same, regardless of which processor you use.

OpenAPI consumers would still need to be able to resolve local references, expressed as JSON pointers within the document. And they would still need to deal with the possibility of recursive references. But they wouldn't need to deal with those other levels of complexity or general fussiness in the OpenAPI spec.

tedepstein commented 5 years ago

The more I think about this, the more I'm convinced that it's critical to the success of the OpenAPI ecosystem. I would go so far as to say that we should not introduce traits, a.k.a. mixins (#1843), into the OpenAPI spec unless we also define a canonical or simplified form.

Anecdotal evidence: OpenAPI 3.0 adoption took much longer than we hoped. Developers were waiting for tools and platform support; tool and platform providers were waiting for demand to reach critical mass; and there was no "killer app" to drive the ecosystem to OAS v3.

You could argue that OpenAPI 3.0 was different, because 3.0-to-2.0 conversions, which might have facilitated adoption by OAS consumers, were inherently lossy and therefore not a practical solution. By contrast, traits can be resolved by a preprocessor with no information loss, and we could just let the open source community build those preprocessors.

You could also argue that, whatever complexities might exist in OpenAPI, we can leave it to the open source community to build preprocessors like Kaizen OpenAPI Normalizer and others. We don't need to formalize it in the spec.

But I think these arguments fail to address the economics of the situation.

OpenAPI consumers are a broad category that includes documentation formats, test consoles, code generators, API gateways and API management platforms, among others. OpenAPI producers are a much smaller category that includes editors, code-first frameworks, design tools, and maybe a few others.

If I'm an OpenAPI consumer looking at a new release of the OpenAPI spec, my goal is to support that new release and advertise that support, with minimum effort. If it's difficult for me to support a new feature like traits (and it will be difficult), I have a few options:

  1. Bite the bullet and write the code to support traits.
  2. Advertise half-assed support for OpenAPI 3.1... without traits. If someone wants to use my service with a "traitful" OpenAPI document, it's up to them to pre-process and send me a traitless OAS 3.1 document.
  3. Look for open source processors to help by resolving the traits, maybe even converting 3.1 to 3.0 with some information loss.

The first two options are obviously not very attractive. The third option might seem fine. But consider what this means:

That's a big enough barrier to almost guarantee slow adoption of OpenAPI 3.1.

Now, if OpenAPI 3.1 officially defines a Canonical Form, even in very simple terms, it changes the economics pretty dramatically for me as an OpenAPI consumer:

Not that I've heard anyone raise a strong objection to this yet. But I think this is a simple and powerful way to reduce friction in the OpenAPI ecosystem.

MikeRalphson commented 5 years ago

Different processors could accomplish this in different ways, and Canonical Form does not guarantee that the output will always be exactly the same, regardless of which processor you use.

I believe we would be creating problems for ourselves and tooling authors if we did not specify (with examples) exactly how the resolution of overlays, traits/mixins and $refs should be resolved, to a truly canonical form whereby each conforming tool produces exactly the same output when canonicalizing the same input. See for example https://en.wikipedia.org/wiki/Canonical_XML

pjmolina commented 5 years ago

True, the semantics of the Extended Form should generate a unique Canonical Form. Moreover, we can provide a Test Suite to illustrate the expected input + expected output.

tedepstein commented 5 years ago

If the consensus is that we should go for this level of specificity, I don't object.

My position is that OpenAPI is already in need a simplified or normalized form, whether or not it's strict enough to be called a Canonical Form. And that we should not introduce traits unless we also provide this.

If a simplified form is on the critical path to traits, as I believe it should be, I just want to make sure we have enough time to do it. I would rather have a "simplified form" done in time for a 3.1 release than a "canonical form" still in progress.

orubel commented 3 years ago

@tedepstein "If we can separate roles, so that OpenAPI consumers don't have to be responsible for piecing together the API description from its constituent parts, I think that would be a big win."

Thats actually false. Roles cannot be second or third in line in line as a reference point. They are the first/second point of reference for the endpoint so you can be in compliance with API3:2019(https://apisecurity.io/encyclopedia/content/owasp/api3-excessive-data-exposure.htm) and API6:2019(https://apisecurity.io/encyclopedia/content/owasp/api6-mass-assignment.htm)

The proper line of reference should be:

ENDPOINT > ROLE > REQUEST DATA ENDPOINT > ROLE > RESPONSE DATA

Like so:

"user/update": {
    ...,
    "REQUEST": {
        "permitAll":["username","password","email"],
        "ROLE_ADMIN":["id"]
    },
    "RESPONSE": {
    "permitAll":["id","version"]
    }
}

It is impossible to rely on separate security mechanism to do this as it is not making this check in association with the endpoint so this check is being missed entirely at gateway where security is for a majority of applications that rely on OpenAPI.

So OpenAPI is 100% vulnerable to 2 of 10 high security issues

kscheirer commented 2 years ago

Overlay has now been proposed, which supports traits. I think this answers questions raised in this issue, please comment if there is more to resolve here.

orubel commented 2 years ago

Nice of you to minimize the fact I pointed a security risk but the security risk still exists.