OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
28.9k stars 9.07k forks source link

Are these OpenAPI 3 paths ambiguous? #2564

Open bfreuden opened 3 years ago

bfreuden commented 3 years ago

As suggested by @MikeRalphson on Stackoverflow, I'm asking the question here as well.

Are those OpenAPI 3 paths ambiguous?

/shops/{shopId}/pets/{petId}   
/shops/{shopId}/pets/_search

I want to answer no but, strictly reading the spec, I can't decide because they seem to fall into none of the 3 statements made by the spec:

  1. Neither path is concrete (term used in the spec)
  2. Paths don't seem to meet the Templated paths with the same hierarchy but different templated names criteria (that is not very clear to me, here is my understanding: "/shops/{}/pets/{}" != "/shops/{}/pets/_search")
  3. Paths do not look like the ambiguous example

In addition to the question asked on Stackoverflow, let me ask two additional questions (below).

Should the OA3 spec be improved?

@MikeRalphson's reading of the spec: path are not ambiguous because one is more concrete than the other.

If paths are indeed not ambiguous, then the more concrete notion might need to be defined.

How could the OA3 spec be improved?

We might add an example like this:

Assuming paths sharing a common and identical prefix, /shops/{shopId}/pets, the more concrete definition, /shops/{shopId}/pets/_search, will be matched first if used:

  /shops/{shopId}/pets/{petId}
  /shops/{shopId}/pets/_search

Or we might only show minimalistic examples involving templated names, and say that they also apply in case of common and identical prefixes:

First statement (concrete vs template case):

  /{otherPlace}
  /here

Second statement (considered identical and invalid):

  /{id}
  /{name}

Third statement is left unchanged (ambiguous resolution):

/{entity}/me
/books/{id}

Related excerpt of the OA3 spec

The "Paths object" paragraph of the OpenAPI 3 specification (https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.0.md#paths-object) is stating (3 sentences, 3 statements):

When matching URLs, concrete (non-templated) paths would be matched before their templated counterparts. Templated paths with the same hierarchy but different templated names MUST NOT exist as they are identical. In case of ambiguous matching, it's up to the tooling to decide which one to use.

Those 3 statements are followed by 3 examples (and that's it):

Assuming the following paths, the concrete definition, /pets/mine, will be matched first if used:

  /pets/{petId}
  /pets/mine

The following paths are considered identical and invalid:

  /pets/{petId}
  /pets/{name}

The following may lead to ambiguous resolution:

/{entity}/me
/books/{id}
rwalle61 commented 3 years ago

Thanks for raising this @bfreuden, I believe this issue is similar to https://github.com/OAI/OpenAPI-Specification/issues/2356 - would be great to get an official OpenAPI answer, then we know what to implement in our tool :)

bfreuden commented 3 years ago

In case it may help, I've tested a few openapi-based server frameworks (Nodejs express-openapi, Python fastapi, Java Vert.x, Java Micronaut) and all of them seem to consider that those paths are not ambiguous.

You can find the code here: https://github.com/bfreuden/openapi-validation-tests

The code is including a test suite that is running manually crafted (successful) tests and openapi-based (failing) tests. Openapi-based tests are failing because they are based on a strict interpretation of the OpenAPI specification, so their behavior is unpredictable with regards to /shops/{shopId}/pets/{petId} and /shops/{shopId}/pets/_search.

To be honest, at this point I'm wondering where the OpenAPI specification is drawing (and should be drawing) the line between what is defined in the specification, and what is left to the implementations (server frameworks).

Maybe it is too late for the OpenAPI specification to become clearer on the subject since implementations have started making decisions on what is ambiguous and what is not: I've picked 4 frameworks that are within my programming skills, but there maybe exist other frameworks considering those paths are ambiguous.

darrelmiller commented 2 years ago

I believe the spirit of the original wording was that constant path segments that align with templated path segments are not ambiguous. Updates to the wording around that to make it more clear would be a good thing.

We haven't attempted to make the wording more precise to date because we have been looking at alternative ways to allow more ambiguity but provide tooling with a deterministic way of identifying the best path to match. This was part of the ongoing conversation about enabling multi segment parameters and optional path segments.

Unfortunately that conversation stalled a bit around the topic of what are the highest priority items that we should be working on. The irony.

bfreuden commented 2 years ago

@darrelmiller Thanks for your comment. I agree it is not easy to find a good wording.

You will find below some thoughts about it.

First it might be worth defining the notion of identical paths:

Templated paths with the same hierarchy but different templated names MUST NOT exist as they are identical. https://spec.openapis.org/oas/v3.0.3#paths-object

because it seems confusing (see https://github.com/OAI/OpenAPI-Specification/issues/2356) and I admit I can only guess the meaning (I have a good guess but that's only a guess).

I guess the algorithm is:

function identicalPaths(oapth1, oapath2) {
  const pathParamRegex = /\{[^}]+\}/ 
  return oapth1.replaceAll(pathParamRegex, "{}") == oapth2.replaceAll(pathParamRegex, "{}")
}

A definition could be:

Paths are considered identical if they are lexicographically equal after removing their curly braces delimited template expressions (or after replacing them with a constant).

An example could be:

Those paths are identical:

/{pet}/name
/{owner}/name

Because replacing their delimited template expressions with the {} constant leads to the same string: /{}/name

Second it might be worth defining the notions of path, parent path and direct parent path.

A definition of path could be (inspired from https://en.wikipedia.org/wiki/URL):

A path is a sequence of path segments separated by a slash (/) and starting with a slash (/). A path segment MUST NOT contain any slash (/).

Some examples could be:

These are paths:

/
/{pet}
/{pet}/name
/faq

They are containing these path segments: {pet}, name, faq

A definition of parent path could be (inspired from https://en.wiktionary.org/wiki/subpath):

A subpath is a path relative to another path, called a parent path, defined by adding one or more segments at the end of that path. A direct subpath is a subpath defined by adding a single path segment at the end of a path, called the direct parent path.

Some examples could be: /faq is a subpath of / /{pet}/name is a subpath of / but is NOT a direct subpath of / /{pet}/name is a direct subpath of /{pet} / and /{pet} are parent paths of /{pet}/name /{pet} is the direct parent path of /{pet}/name

Third (and last) it might be possible to rephrase this:

When matching URLs, concrete (non-templated) paths would be matched before their templated counterparts.

Into:

When matching URLs, if two paths have identical direct parent paths then the most concrete (least-templated) path would be matched before the other.

karenetheridge commented 2 years ago

Second it might be worth defining the notions of path, parent path and direct parent path.

I'm not sure that any of these definitions of path, path segment etc are relevant. Path placeholders can appear anywhere, not just between slashes, so an OpenAPI implementation that parses a request URI and tries to match it against a list of path specifications (the properties under /paths in an OpenAPI document) can treat them as opaque strings -- it is not terribly helpful to split them up by path segment. [I've tried! it didn't help once I considered paths with multiple placeholders per segment, or a placeholder that crosses a directory boundary, etc.]

Paths are considered identical if they are lexicographically equal after removing their curly braces delimited template expressions (or after replacing them with a constant).

But this I like! 💯

bfreuden commented 2 years ago

@karenetheridge Thanks for your comment!

I'm not sure that any of these definitions of path, path segment etc are relevant.

I agree: I pressed the Comment button with a feeling I missed the principle of parsimony objective.

However if what I described above (awkwardly but accurately, right?) is true (the authors of the specification have to decide that) I think we need to define two things:

[1] multiple placeholders per segment [and 2] placeholder that crosses a directory boundary

Now I understand I forgot taking into consideration:

I didn't know the spec allowed case [2]: that's quite a wild thing :). Do you confirm it is the case?

I'd be curious to know how many libraries implement that. The one I'm using (Eclipse Vertx) does not :).

karenetheridge commented 2 years ago

I don't think your case [2] is allowed -- "The value for these path parameters MUST NOT contain any unescaped “generic syntax” characters described by [RFC3986]: forward slashes (/), question marks (?), or hashes (#)." in https://spec.openapis.org/oas/v3.1.0#path-templating. But perhaps the authors meant to say "the names for these path parameters"? Value (as in, populated from the actual request URI) sounds more like it, because # and ? should be excluded because they are markers for fragments and query parameters, respectively.

If "value" is correct here, this is convenient for writing a path-matching algorithm, for one can split the list of possible path specifications by / and also split the incoming URI path by / to give individual sections to match against -- and if they are sections that do not contain a template ({..}) then matching is straightforward, and lets us drill down to a small number of possible path items to match templated sections against.

(Although personally I've found that replacing each templated {...} with named captures, for inserting into a single regular expression, works well enough - e.g. /foo/{bar}-{baz}/{bloop} becomes ^/foo/(?<bar>[^/]+)-(?<baz>[^/]+)/(?<bloop>[^/]+)$ and the matched values show up in variables bar, baz and bloop (if a match was found).