falconry / falcon

The no-magic web data plane API and microservices framework for Python developers, with a focus on reliability, correctness, and performance at scale.
https://falcon.readthedocs.io/en/stable/
Apache License 2.0
9.51k stars 937 forks source link

feat(routing): add support for multiple path segment fields #1945

Closed CaselIT closed 2 years ago

CaselIT commented 3 years ago

Summary of Changes

This adds support for a converter to be declared as consuming all the remaining path.

The current implementation does not allow nesting these converters and is quite conservative (I think), but I believe we can improve on this later if we need to.

Related Issues

Closes #423 Fixes #648 Relates to #1895

Pull Request Checklist

This is just a reminder about the most common mistakes. Please make sure that you tick all appropriate boxes. But please read our contribution guide at least once; it will save you a few review cycles!

If an item doesn't apply to your pull request, check it anyway to make it apparent that there's nothing to do.

If you have any questions to any of the points above, just submit and ask! This checklist is here to help you, not to deter you from contributing!

PR template inspired by the attrs project.

codecov[bot] commented 3 years ago

Codecov Report

Merging #1945 (18fcc6a) into master (83682f6) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master     #1945   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           63        63           
  Lines         6715      6752   +37     
  Branches      1243      1254   +11     
=========================================
+ Hits          6715      6752   +37     
Impacted Files Coverage Δ
falcon/routing/__init__.py 100.00% <100.00%> (ø)
falcon/routing/compiled.py 100.00% <100.00%> (ø)
falcon/routing/converters.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 83682f6...18fcc6a. Read the comment docs.

CaselIT commented 2 years ago

After https://github.com/falconry/falcon/pull/1990 is merged we should update the docs there to mention that for simple cases :path may be used

vytas7 commented 2 years ago

@CaselIT Have you considered to buck the trend seen among other frameworks where this is packaged as a converter, and add new syntax to router templates instead?

I don't know what the most aesthetic way to express that could be, but the advantage would be that one could use any converter with a path segment, or even use no converter for simple cases? Something like /route/{itemid}/repository/{url/} or /route/{itemid}/repository/{/url} maybe? (And with converters /route/{itemid}/repository/{url/:relativeurl} or /route/{itemid}/repository/{/url:relativeurl}.)

(This is just a random idea though, maybe it makes sense to have a path converter after all.)

CaselIT commented 2 years ago

I'm not really sold on the idea, since it's another thing that should be documented, explained etc.

I think your example would work better with /route/{itemid}/repository/{url:relativepath}. path is not special, meaning that the way it indicates that it can captures more than one segment is available to other converters

vytas7 commented 2 years ago

Agreed that my idea ain't very pretty either. But functionality wise, it depends on the way you view it. Say that you have a useful converter that might accept path, but in another case you would want to restrict it to a single field, or just use it in another field than the very last one (suffix). I guess you could subclass it to override that flag, and then register the new type as well, but it won't be particularly pretty to document either.

CaselIT commented 2 years ago

Since the interface of the converters changes, I think it makes sense to have two implementation in that case. (path like converters get passed a list of path segments, not a single segment)

vytas7 commented 2 years ago

Another issue that is useful to have in mind when discussing the big picture: https://github.com/falconry/falcon/issues/1567 (which would actually favour your current design, but maybe we could devise a more generic way to provide hints?)

CaselIT commented 2 years ago

From a quick look I don't think that the current implementation is incompatible with that issue

vytas7 commented 2 years ago

I know that although Falcon has moved away from RFC 6570, the said paper has all kinds of funky expansions, so maybe something familiar could be adapted from Section 3.2.6, like {list*} :arrow_right: /red/green/blue. (Or maybe nobody cares about RFC 6570 any longer?)

Maybe we could have a discussion [meeting?] about this on chat at some point :thinking: I don't have any elegant suggestions, but maybe someone else does.

CaselIT commented 2 years ago

That may be an option, but we risk having two ways of doing something, having both the converters and this special syntax. Also what would happen if both a converter and * are used together? Like {list*:int}

vytas7 commented 2 years ago

Dunno, I was just thinking what the options were before we move forward with this. But we should move forward in one or another way 🙂

vytas7 commented 2 years ago

It would be also good to reason about this change in the perspective of adding re converter (https://github.com/falconry/falcon/issues/857). Requiring a CONSUME_PATH attribute on the converter class would force a decision to make re of one or another type, however, I can imagine both could be useful :thinking:

Of course, there are different ways to accommodate this in the CONSUME_PATH design as well. One could add two different types, like re and repath, or just make path accept an optional regexp argument?

CaselIT commented 2 years ago

thanks for the review @kgriffs

sorry for the delay in getting to this

CaselIT commented 2 years ago

There is a decision to take here: should a "consume multiple segments" converter be allowed to match 0 segments? the current implementation allows it.

Btw I've taken another look at supporting additional segments after a "consume multiple segments" converter it's not trivial. Assuming matching zero segments is ok, the finder would need to be something like the following

from dataclasses import dataclass, field
from itertools import count
from falcon.routing import CompiledRouter
from falcon.routing.converters import PathConverter

seq = count(1)

@dataclass
class Tmp:
    id: int = field(default_factory=seq.__next__)

    def on_get(self, req, resp, **kw):
        resp.media = {'id': self.id, 'kw': kw}

r = CompiledRouter()

def add(url, res):
    print(res, url.replace('{x}', '{x:path}'))
    r.add_route(url, res)

add('/foo/{x}', Tmp())
add('/foo/{x}/foo', Tmp())
add('/foo/{x}/{baz:int}', Tmp())
add('/foo/{x}/foo/bar', Tmp())
add('/foo/{x}/{baz:int}/baz', Tmp())

r._compile()
# print(r.finder_src)

def find(path, return_values, patterns, converters, params):
    converters = [PathConverter(), *converters]
    rv = {r.resource.id: r for r in return_values}
    path_len = len(path)
    if path_len > 0:
        if path[0] == 'foo':
            if path_len > 1:
                remaining_path = path[1:]
                remaining_path_len = len(remaining_path)
                if remaining_path_len > 1:
                    if remaining_path[-1] == 'bar':
                        if remaining_path[-2] == 'foo':
                            fragment = remaining_path[:-2]
                            field_value_1 = converters[0].convert(fragment)
                            if field_value_1 is not None:
                                params['x'] = field_value_1
                                return rv[4]
                    if remaining_path[-1] == 'baz':
                        fragment = remaining_path[-2]
                        field_value_2 = converters[1].convert(fragment)
                        if field_value_2 is not None:
                            fragment = remaining_path[:-2]
                            field_value_1 = converters[0].convert(fragment)
                            if field_value_1 is not None:
                                params['baz'] = field_value_2
                                params['x'] = field_value_1
                                return rv[5]
                if remaining_path_len > 0:
                    if remaining_path[-1] == 'foo':
                        fragment = remaining_path[:-1]
                        field_value_1 = converters[0].convert(fragment)
                        if field_value_1 is not None:
                            params['x'] = field_value_1
                            return rv[2]
                    fragment = remaining_path[-1]
                    field_value_2 = converters[1].convert(fragment)
                    if field_value_2 is not None:
                        fragment = remaining_path[:-1]
                        field_value_1 = converters[0].convert(fragment)
                        if field_value_1 is not None:
                            params['baz'] = field_value_2
                            params['x'] = field_value_1
                            return rv[3]
                field_value_1 = converters[0].convert(remaining_path)
                if field_value_1 is not None:
                    params['x'] = field_value_1
                    return rv[1]
                return None
            return None
        return None
    return None

r._find = find

def go(url):
    res = r.find(url)
    if res is None:
        print(url, res)
    resource, _, param, _ = res
    print(resource, param, url)

print('-' * 20)
go('/foo/foo')
go('/foo/bar')
go('/foo/bar/123')
go('/foo/bar/123/x')
go('/foo/bar/123/x/foo')
go('/foo/foo/bar/123/x/foo')
go('/foo/bar/123/x/foo/bar')
go('/foo/123/baz')
go('/foo/bar/123/baz')
go('/foo/bar/123/x/foo/bar/123/baz')

running the above yields

Tmp(id=1) /foo/{x:path}
Tmp(id=2) /foo/{x:path}/foo
Tmp(id=3) /foo/{x:path}/{baz:int}
Tmp(id=4) /foo/{x:path}/foo/bar
Tmp(id=5) /foo/{x:path}/{baz:int}/baz
--------------------
Tmp(id=2) {'x': ''} /foo/foo
Tmp(id=1) {'x': 'bar'} /foo/bar
Tmp(id=3) {'baz': 123, 'x': 'bar'} /foo/bar/123
Tmp(id=1) {'x': 'bar/123/x'} /foo/bar/123/x
Tmp(id=2) {'x': 'bar/123/x'} /foo/bar/123/x/foo
Tmp(id=2) {'x': 'foo/bar/123/x'} /foo/foo/bar/123/x/foo
Tmp(id=4) {'x': 'bar/123/x'} /foo/bar/123/x/foo/bar
Tmp(id=5) {'baz': 123, 'x': ''} /foo/123/baz
Tmp(id=5) {'baz': 123, 'x': 'bar'} /foo/bar/123/baz
Tmp(id=5) {'baz': 123, 'x': 'bar/123/x/foo/bar'} /foo/bar/123/x/foo/bar/123/baz

In the above example /foo/{x:path}/foo matches /foo/foo with x=''

I'll copy the above in a new issue. Edit: created https://github.com/falconry/falcon/issues/2061

kgriffs commented 2 years ago

There is a decision to take here: should a "consume multiple segments" converter be allowed to match 0 segments? the current implementation allows it.

Personally I would expect it to never match if there is no path to consume, because otherwise what's the point of using the path converter? Or at least, if there is another route registered for the exact path prefix, that one would take precedence over the route containing the path converter.

One could argue it my be useful to be able to handle /foo/bar/bang and /foo/bang` within the same resource controller, but that doesn't feel like a good API design to me (though it may be useful for re-implementing a legacy API?). So perhaps we could introduce an option to allow 0 segments but it is disabled by default? Or we introduce that along with #2061 ?

kgriffs commented 2 years ago

Code looks good, we just need a consensus on what to do about allowing 0-length path matches.

vytas7 commented 2 years ago

If I understand the problem right, IMHO, for the sake of consistency, the new converter should behave like fields behave now. They do match an empty segment after a trailing slash, and they do not otherwise.

Given the following test app,

import falcon

class Resource:
    def on_get(self, req, resp, resourceid):
        resp.media = {'path': req.path, 'resourceid': resourceid}

(app := falcon.App()).add_route('/api/{resourceid}', Resource())

Cf

$ xh http://localhost:8000/api/
HTTP/1.1 200 OK
Content-Length: 35
Content-Type: application/json

{
    "path": "/api/",
    "resourceid": ""
}

vs

$ xh http://localhost:8000/api
HTTP/1.1 404 Not Found
Content-Length: 26
Content-Type: application/json
Vary: Accept

{
    "title": "404 Not Found"
}
CaselIT commented 2 years ago

That's the current behavior, unless strip_url_path_trailing_slash=True, in which case both yield 404

Also note that my question was only once we add support for additional parts after the path. Basically the question what if /foo/{bar:path}/baz should match /foo/baz or not.

vytas7 commented 2 years ago

Aha, right, I didn't realize the question was only about when we lift the requirement for a path segment to be last. Then maybe we don't need to make a decision right now?

In any case, I would perceive routing as strange and confusing if /foo/{bar:path}/baz matched /foo/baz, it wouldn't be consistent with the current behaviour where a field essentially needs a preceding / to match, and in this case the path field would rob /baz of that slash. So my vote is on "it shouldn't match".

CaselIT commented 2 years ago

Aha, right, I didn't realize the question was only about when we lift the requirement for a path segment to be last. Then maybe we don't need to make a decision right now?

In any case, I would perceive routing as strange and confusing if /foo/{bar:path}/baz matched /foo/baz, it wouldn't be consistent with the current behaviour where a field essentially needs a preceding / to match, and in this case the path field would rob /baz of that slash. So my vote is on "it shouldn't match".

We really ought to move the conversation here https://github.com/falconry/falcon/issues/2061

vytas7 commented 2 years ago

Sure -- I was just pinged here on this issue :slightly_smiling_face:

kgriffs commented 2 years ago

If I understand the problem right, IMHO, for the sake of consistency, the new converter should behave like fields behave now. They do match an empty segment after a trailing slash, and they do not otherwise.

Good point, I'm OK with moving forward w/ this as long as the behavior is explicitly documented for the developer.

the question was only about when we lift the requirement for a path segment to be last.

Actually, I wanted your input on both questions. 😄 Second one is not urgent. Thanks!

vytas7 commented 2 years ago

Yes so, I think no converter should match 0 fields, because that would be a substantial change in comparison to how the framework behaves now. Matching an empty string after a trailing slash is a bit special edge case (or oversight) which already exists now, so path converter can also do the same, but /foo/{bar}/{baz:path} ought not to match /foo/bar.

CaselIT commented 2 years ago

thanks for the review

but /foo/{bar}/{baz:path} ought not to match /foo/bar.

it does not at the moment. without / in the last segment the behavior will be the same as if using /foo/{bar}/{baz} as template

kgriffs commented 2 years ago

vytas7 requested changes 20 hours ago

lol, this is why it is good to require >= 2 approvals. You guys inevitably catch things I miss.

CaselIT commented 2 years ago

LGTM sans the still pending FAQ update: How can I handle forward slashes within a route template field?

sorry I missed the previous message. Will do that