OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
28.86k stars 9.07k forks source link

Optional and multi-segment path parameters #2653

Closed darrelmiller closed 8 months ago

darrelmiller commented 3 years ago

This is a meta issue representing a number of requests over the years including #1459 #1840 #892

Currently the OpenAPI specification does not allow optional path parameters nor path parameter values that allow characters such as the /. This means that if you had the following API description:

  /myfiles/{mypath}/{filename}:  
    parameters:
      - name: mypath
        type: string
      - name: filename
        type: string

then you would not be allowed to have myPath = "a/path/to/a/file" unless you escaped the forward slash character.

Also, the RFC6570 URI Template /myreports/{reportName}{/nonDefaultFormat} where the last path segment is optional is not supported by OAS.

The primary reason for not supporting these types of APIs is that it creates the potential of an ambiguous match between a URL and the corresponding path item. Some tooling depends on being able to identify the API description for a specific URL. If multiple pathItems match, then some kind of alternate selection algorithm must be defined.

There have been a number of suggestions on how that selection algorithm might work, with varying levels of complexity. The open question is if there is enough community demand to justify the work necessary to find an acceptable solution.

If you have real-world scenarios where adding support for either multi-segment path parameters or optional path segments would make your life easier, please share them in this issue.

xuorig commented 3 years ago

We have a few use cases when it comes to GitHub API v3. The current workaround is to annotate multi segment parameters with an extension, but this is far from ideal as far as tooling support and specification adherence goes.

jdesrosiers commented 3 years ago

The URI Template + operator is another alternative although I think it has the same potential for ambiguity as the / operator. The + operator allows reserved characters, so the example /myfiles/{mypath}/{filename} could be /myfiles/{+mypath}/filename which could match /myfiles/a/path/to/file/myfile to { "mypath": "a/path/to/file", "filename": "myfile"}.

I know @awwright has written a URI matcher for URI Template (https://github.com/awwright/uri-template-router). He might have some insight on how to deal with the ambiguities.

darrelmiller commented 3 years ago

@jdesrosiers The allowReserved property was added to the parameter object in 3.0 to provide the same capability as the + operator, but we limited it to only be usable in query parameters. https://spec.openapis.org/oas/v3.1.0#fixed-fields-9

Yeah, I've travelled that same path writing a URI matcher for URL Templates https://github.com/tavis-software/Tavis.UriTemplates/blob/master/src/UriTemplates/UriTemplate.cs#L355 It's not fun.

awwright commented 3 years ago

uri-template-router tries to invent a general algorithm for reversing a URI Template and it is, indeed, somewhat complicated, in large part because URI Template isn't designed for matching.

The first rule I try to follow is a simple, self-evident principle: If X is a strict subset of Y, then match X before Y.

This means that http://example.com/a/b/ will match templates in the following order:

  1. http://example.com/a/{+foo}/ and http://example.com/{+foo}/b/
  2. http://example.com/{a}/{b}/ (requires exactly three slash characters in the path)
  3. http://example.com/{+a}/{+b}/ (requires at least three slash characters in the path)
  4. http://example.com/{+foo}/ (requires at least two slash characters in the path)
  5. http://example.com/{+foo} (the biggest set)

Then for picking between disjoint URI templates, prefer templates that are strict subsets when only looking at the first characters. Therefore, http://example.com/a/{+foo}/ would be matched first because if you compare only the first 20 characters of matching URIs, http://example.com/a is a strict subset of http://example.com/{+foo} (where {+foo} matches zero-or-one characters).

Finally, I match empty strings by default. Because URI Template isn't a matching syntax (like regexp), it doesn't have a way for requiring a minimum number of matches. Developers requiring a match are expected to validate the values for the matches, and call Result#next if one of the variables is invalid (e.g. is empty, when a minimum of 1 character is expected).

I recall there being one or two small bugs with the algorithm that I have, I was in the process of mathematically trying to prove that an algorithm implementing all these requirements is computationally possible, and then got sidetracked.

awwright commented 3 years ago

I dove into a "parsing strings" rabbit hole and I've come up with some interesting findings:

A URI Template is a subclass of a regular expression, which in turn can be expressed as a Non-deterministic Finite Automaton (NDFA), which in turn is expressible as a Deterministic Finite Automaton (DFA).

There is an optimal, normalized form for DFA. Once you build a DFA from a URI template, you can cache it/save it.

You can reverse the DFA back into a regular expression or URI Template.

Any URI that you can build with a URI Template can be reversed into one or more values for that URI template that will build that URI, and this can be done deterministically (so that there is a "canon" value).

It is also mathematically possible to detect these ambiguous cases, where a single URI is potentially parsed into multiple values.

The algorithms for building the DFA are complicated (at least 1k LoC), but the machine is computationally straightforward: Once you've built the state machine, it can be parsed in O(n) time, n = length of input string, no backtracking.

You can even add complicated processing on top of the URI Template. For example, you could register two URI templates:

  1. http://example.com/blog/{+foo} where foo ~= /^\d+$/
  2. http://example.com/blog/{bar} where bar ~= /[0-9a-fA-F]+/

... and the DFA compiler could detect that (1) is a subset of (2), because of the regex constraints, even though the URI Templates by themselves would suggest the opposite.


In a world where Godel's incompleteness theorem usually rules, and O(n!) algorithms are often our best case, this is pretty exciting. The question is, what can OpenAPI take from this? My opinion is that frameworks exist to remove implementation burden from developers. But writing all of this into OpenAPI (e.g. the regex constraints on URI Template variables) could potentially be prohibitive to complete implementations being written at all.

I'm going to continue to work on this; but what sort of work would you find useful?

yinzara commented 2 years ago

All of APIs that Google provides via protobuf/grpc they also specify a google.api.http option that provides a HTTP 1.0 endpoint. They often use the syntax: /some/prefix/{name=resourceType/*} for path parameters for specific resources. Unfortunately because of the lack of RFC 6570 support in OpenAPI, all of the Google APIs cannot be represented in OpenAPI.

For development teams that choose to support both gRPC and RESTful endpoints, this means they can't follow the guidelines Google provides around API design which saves small teams a huge amount of documentation and maintenance plus gives their APIs a consistent usage.

https://cloud.google.com/apis/design

ryber commented 2 years ago

Hello I would like to offer a real world example of the need for this.

I have a legacy API that uses optional path params for paging information. So URLs like: https://somewhere/books;count=100?someparam=foo

I'm trying to put a new API gateway in front of this API and the gateway strictly requires all APIs to be done in OpenAPISpec, because I cannot model the optional path param (that I can see), the Gateway is returning 404 for any requests containing it. Unfortunately this is not something I can change, the API has thousands of consumers and serves 40 billion requests a year. Switching to all query params or headers or something else would be a major breaking change. my only other option would be to write some proxy in front of the Gateway to rewrite the URLs but that seems silly. So it would be great if the Spec could support this.

eugene-bright commented 2 years ago

I propose to add support of {+path} expression from RFC6570. Please vote on this if you agree!

rafalkrupinski commented 2 years ago

wouldn't http://example.com/{+a}/{+b}/ be ambiguous? Probably only one {+PathElement} per path should be allowed

awwright commented 2 years ago

@rafalkrupinski

wouldn't http://example.com/{+a}/{+b}/ be ambiguous?

No, not really. It's ambiguous in the sense that there's multiple values for a and b that can produce the same URI. However, because these forms all map to the same URI, they must identify equivalent things, so there's no ambiguity in practice.

There's only one match that will be made by a finite state machine; the one such that b will never contain any "/" characters.

rafalkrupinski commented 2 years ago

@awwright

There's only one match that will be made by a finite state machine; the one such that b will never contain any "/" characters.

Isn't that arbitrary? Might as well be parsed such that a never contains any '/' characters. Besides, that would mean the pattern degenerates to http://example.com/{+a}/{b}/ (or http://example.com/{a}/{+b}/)

ryber commented 2 years ago

In our case we have multiple params per path but they always come at the end: http://example.com/foo;count=5;start=10

yinzara commented 2 years ago

In our case we have multiple params per path but they always come at the end: http://example.com/foo;count=5;start=10

Those are "matrix" parameters, not "path" parameters.

ryber commented 2 years ago

@yinzara my bad, you are correct, are Matrix params supported by OpenAPISpec? Because I have not really been able to get them to work when they are Optional, which is what brought me here. I assume this would be a different issue?

awwright commented 2 years ago

@rafalkrupinski

Isn't that arbitrary? Might as well be parsed such that a never contains any '/' characters.

It's not arbitrary in the sense that someone choose it to work that way, from among other options; that's just how finite state machines work.

Besides, that would mean the pattern degenerates to http://example.com/{+a}/{b}/ (or http://example.com/{a}/{+b}/)

Sort of, the + operator permits many additional characters, not just "/".

Besides that, yes, for any URI Template where there's multiple different values that map to the same URI, there's a normalized form that will match only the "first" of those equivalent values (leaving the "redundant" values to not match at all). It might not be expressible as a URI Template, but you can notate it as a regular expression.

karenetheridge commented 2 years ago

@ryber > my only other option would be to write some proxy in front of the Gateway to rewrite the URLs but that seems silly.

That doesn't seem silly at all to me. It's the right thing to do when marking an endpoint as deprecated while still supporting it. If you can rewrite the old form to the new form, then your application only needs to support the new form, and it also lets you gate which clients are still allowed to use the old form (you can use the User-Agent or an authorization token to block new clients from using the old form).

IMSoP commented 1 year ago

I am very much interested in this, because I would like to be able to document APIs that already exist, even if they don't conform to common designs.

As such, I think it would be useful to look at the routing patterns actually supported by popular implementations, rather than trying to define a new system, or adapt an RFC intended for a different purpose.

In PHP, the most popular router is probably nikic/fastroute which allows aribtrary optional text (not just optional parameters), but in limited positions, presumably to limit the the ambiguity:

Furthermore parts of the route enclosed in [...] are considered optional, so that /foo[bar] will match both /foo and /foobar. Optional parts are only supported in a trailing position, not in the middle of a route.

// This route
$r->addRoute('GET', '/user/{id:\d+}[/{name}]', 'handler');
// Is equivalent to these two routes
$r->addRoute('GET', '/user/{id:\d+}', 'handler');
$r->addRoute('GET', '/user/{id:\d+}/{name}', 'handler');

// Multiple nested optional parts are possible as well
$r->addRoute('GET', '/user[/{id:\d+}[/{name}]]', 'handler');

// This route is NOT valid, because optional parts can only occur at the end
$r->addRoute('GET', '/user[/{id:\d+}]/{name}', 'handler');

In node.js the ExpressJS Routing component uses a "path to regexp" library, which allows optional parts to appear anywhere in the URL. Ambiguity is resolved by matching patterns in the order they're defined (it's also possible for a handler to call next('route') to find the next matching pattern, but that's probably less relevant to the current discussion).

In ASP.net Core, the Routing system features "URL templates" supporting optional parameters with {foo?} and defaulted parameters with {foo=bar}, as well as "catch-all" parameters with {*foo} or {**foo}. The matching rules are quite involved, but well-defined, and ambiguity is resolved using a "priority" system which by default is based on the specificity of the match:

For example, consider templates /Products/List and /Products/{id}. It would be reasonable to assume that /Products/List is a better match than /Products/{id} for the URL path /Products/List. This works because the literal segment /List is considered to have better precedence than the parameter segment /{id}.

The details of how precedence works are coupled to how route templates are defined:

  • Templates with more segments are considered more specific.
  • A segment with literal text is considered more specific than a parameter segment.
  • A parameter segment with a constraint is considered more specific than one without.
  • A complex segment is considered as specific as a parameter segment with a constraint.
  • Catch-all parameters are the least specific. See catch-all in the Route templates section for important information on catch-all routes.

Another way to look at it is that optional components or parameters are (at least in most cases) just a short-hand for something that can already be expressed by defining multiple routes with the same properties. Note the FastRoute example can be expressed as two separate routes; and the ASP.net example is ambiguous even though it uses two separate patterns, not an optional segment.

It seems to me that for most use cases, it would be sufficient to have two algorithms specified:

jpsalvesen commented 1 year ago

Random dev who bumped into this challenge here...

Maybe it would be helpful to pick a good, intuitive use case that there is a need for: Expressing a file system path.

So, https://example.com/files/a/path/to/a/file

If we express our path like /files/{directories+}/{filename} then there will be no ambiguity. There is clearly one or more directory and they should be deserialized into a list. The file is at the end.

But if, in a slightly conceived example, we have packs of dogs and clowders of cats, how can this be expressed in a path?

/fido/buster/kitty/bella/luna /fido/buster/buddy/bella/luna

We'll find that this can't be done. A path definition like /{+dogs}/{+cats}/ doesn't work because there is no knowing where one list starts and the other ends.

I think this is the ambiguity problem discussed and why having two multi-path segments next to each other won't work.

This path definition, however, can be deserialized nicely. /dogs/fido/buster/buddy/cats/bella/luna

/dogs/{+dogs}/cats/{+cats}

Well, unless there is a jokester that names their cat "cats"...

awwright commented 1 year ago

@jpsalvesen This actually has an answer, see my comment at https://github.com/OAI/OpenAPI-Specification/issues/2653#issuecomment-1219139893

In regular expressions, selecting between behaviors for how to handle this is called "greedy" and "non-greedy".

jpsalvesen commented 1 year ago

The point I'm trying to make is that being able to write a valid regular expression does not automatically mean it fishes out the data you want.

Syntactically, http://example.com/{+a}/{+b}/ is valid. But semantically, it's still ambiguous. The more concrete example http://example.com/{+dogs}/{+cats}/ explains why and how. Just give this some thought, and you'll find out why this problem is controversial when building a syntax for describing data.

eugene-bright commented 1 year ago

Yeah, full control over regexp group definition must be exposed, not just greedy "any char". I'd like to see something like PEG parser here.

awwright commented 1 year ago

@jpsalvesen I get what you're saying, but there's multiple concepts to distinguish:

A URI template maps a variable binding to a URI.

When you write a URI like http://example.com/{+a}/{+b}/ then is true there will be multiple bindings that will map to the same URI. But by definition, because they all form the same URI, they must identify the same resource. All this means is that you have multiple bindings that identify the same resource. For example, http://example.com/1/2/3/ identifies the same resource as { a: "1/2", b: "3" } identifies the same resource as { a: "1", b: "2/3" }.

Further, among these alternate variable bindings, only one is canonical. In this example, the first match ({ a: "1/2", b: "3" }) is the canonical one that will be produced by a finite state machine.

So, the URI Template may be surjective, but it is not ambiguous; there is an injective inverse.

jpoints commented 1 year ago

:(

SteffenBlake commented 11 months ago

As others above have stated, it should be a greedy state machine.

The pattern of /{+a}/{+b} will always parse /1/2/3/4/5/6/7 as a=1/2/3/4/5/6 and b=7, deterministically.

If you want a double wildcard in this case, you have to specify a deterministic split point other than just a /, so for example:

/{+a}/foos/{+b} would be the solutions.

To use the Cats vs Dogs example, you would do /cats/{+cats}/dogs/{+dogs as a route, which will result in the user needing to submit to a url like so: /cats/bella/luna/dogs/fido/spot which will serialize: cats=bella/luna and dogs=fido/spot

At this point the majority of API Server software supports varieties of wildcard APIs in this manner, so it seems prudent for OpenApi spec to support this, as it's already fully functional on the likes of Python, Js, and C# and has been for awhile.

handrews commented 8 months ago

The Moonwalk (OAS 4) proposal currently includes full support for RFC 6570 URI Templates. Further discussion should happen in that repository, as the change is too large to go in the 3.x line.