falconry / falcon

The no-magic web data plane API and microservices framework for Python developers, with a focus on reliability, correctness, and performance at scale.
https://falcon.readthedocs.io/en/stable/
Apache License 2.0
9.51k stars 936 forks source link

Revisit specifying routes using RFC 6570 URL Templates #161

Closed queertypes closed 9 years ago

queertypes commented 11 years ago

See #160 for a potential use case.

kgriffs commented 11 years ago

I think this is a duplicate, actually...

queertypes commented 11 years ago

It's not strictly a duplicate, but #35 and #114 are related. #35 carries a lot of the ideas behind this proposal - it';s the cherry-picked version. :)

richardolsson commented 9 years ago

As I've been working on alternative routing strategies (for performance) over the past week, I have come to doubt the decision to pursue RFC 6570 compliance for the routes. What is the reasoning behind the idea?

RFC 6570 URI templates are intended for the process of variable expansion, i.e. the reverse of what Falcon needs to do. This is hinted at by the "templates" name. RFC 6570 provides a well defined syntax for describing URLs by means of variable expansion, but it actually does not even attempt (as far as I can tell) to explain how one would do the reverse. This makes sense, since doing the reverse is actually a lot more complicated and not the purpose of the RFC. As an example, consider this case:

Template:  /v1/{+base}/links
Variables: base="path/to/whatever"
Result:    /v1/path/to/whatever/links

This is the type of expansion that URI templates have been created for. In this case, doing the reverse is possible (although a bit tedious) by parsing the result, looking for "/links" which will identify the end of the base variable and allow us to derive it's value. However, now consider a case where the variable itself actually contains the string "/links":

Template:  /v1/{+base}/links
Variables: base="my/links/to"
Result:    /v1/my/links/to/links

Again, expansion is straight-forward. However, there actually exists no way to do the reverse in a consistent, well-defined manner, or at least, none is defined by the RFC (because that's not it's purpose). The problem is that there is no (reasonable, well-defined) way to know whether the base variables ends before the first /links (col 7) or the last (col 16).

In the case of Falcon, this is further complicated by the fact that it's entirely possible for no match to exist (404) and with the above example, it's ambiguous which (if any) of the two URLs /v1/my/links (base = "my") or /v1/my/links/to/links (base = "my/links/to") that should actually return a responder and what policy to employ when one does and the other does not map to a responder.

Even though it's possible (everything is) to decide on some bespoke rules for parsing URLs to templates, which again is not what URI templates are intended for, doing so is vastly more complicated than most (if not all) of the cases that you should find in API design. More than anything, this means that a router which tries to implement more than even the absolute basics of RFC 6570 for it's routing purposes will never be nearly as fast as it could be with a more limited, "proprietary" scheme designed for the reverse look-up process that is actually the use case here.

To give an example of what kind of trade-off we're talking about here, I've measured 50x performance boosts using the optimizations I'm working on right now. However, because these algorithms rely on being able to treat the URI path as a well-defined tree structure, it needs to be able to parse it segment-by-segment which in RFC 6570 lingo means no more than level 1.

For a framework focused on performance, I think trading speed for the support of complex URI templates would be the wrong decision, especially since they are very rarely used.

In my survey of existing, in-production, widely used APIs, (which is by no means complete) I have yet to find a single endpoint scheme which contains anything more complex than simple variables, i.e. RFC 6570 level 1. Furthermore, more than 99% of all URIs contain at most variables which span an entire segment, e.g. /repos/{owner}/{repo}/subscription (from the GitHub API).

Some exceptions to this rule include cases where variables are only parts of a segment, e.g. /inbound/routes.{format} (from the Mandrill API) or where a single URI segment includes more than one variable, e.g. the base/head pair in /repos/{owner}/{repo}/compare/{base}...{head} (from the GitHub API).

I have yet to see any situation where a well-designed, widely used API contains anything more complex than this. In this issue tracker I have seen a request for supporting variables that span (a variable number of) segments, i.e. RFC 6570 level 2 +, or level 3 /. I think these are fair requests, although I would personally avoid designing an API which requires them, and I believe the use case described by that user could be caught by another solution (e.g. an add_route() flag which denotes a pattern as a base which should also match any URI which begins with that pattern).

I think the best thing to do at this point would be to consider the actual use cases for APIs. Many of the features in RFC 6570 are overkill for REST API design so I see no reason to pursue RFC 6570 support. Here is the course of action I would suggest instead:

Curious to hear your reactions to these thoughts.

kgriffs commented 9 years ago

Thanks @richardolsson for your very thoughtful analysis and recommendations. I agree that RFC 6570 isn't as suitable for routing as it may have first appeared when we originally decided to use URI templates (based on an early draft of the RFC). IIRC, the original thought was that if we could use the same thing for routes as would be consumed by clients, you could just use the same strings verbatim in API docs and json-home. Unfortunately, as you pointed out, there are some cases where this breaks down, so we may need to diverge from the RFC (i.e., base Falcon routing templates on level 1 URI templates but add custom extensions as needed per the outcome of the next work item, below).

Figure out what is actually needed in most cases, e.g. based on the API survey that I'm conducting right now.

Makes sense. It looks like you've identified 2 additional formats so far:

One other thing that I've gotten a couple requests for is a way to specify types. Those could go in the route template itself, or be tacked on as additional params to add_route(). IMO, putting them in the template string would be more elegant. We could do something similar to Bottle's "filters" concept. The question is, can this be done with no significant performance impact when the filter/typing is not used?

Implement the smallest possible feature set that meets common requirements without sacrificing performance.

+1

Decouple routing from falcon.API to allow users and contributors to implement bespoke routers for cases where things like routing-time validation or complex URI schemes are prioritized over look-up performance.

I think this would be helpful not just for complex URI schemes, but also for supporting object-based routing ala what @warsaw implemented for Mailman 3. In fact, I thought I had already created an issue for decoupling routing from the API class, but I can't find it. Regardless, I think 0.3 would be a good time to do this, since we are messing around with routing anyway. If you agree, go ahead and create an issue and I can assign the milestone.

/me is reminded that he still needs to do a thorough housecleaning of the issue tracker

kgriffs commented 9 years ago

UPDATED: The deliverable for this will be to update the language used in the docs to remove references to the RFC. Include examples of types of routes that are now supported, such as:

/repos/{org}/{repo}/compare/{usr0}:{branch0}...{usr1}:{branch1}/full