falconry / falcon

The no-magic web data plane API and microservices framework for Python developers, with a focus on reliability, correctness, and performance at scale.
https://falcon.readthedocs.io/en/stable/
Apache License 2.0
9.51k stars 937 forks source link

Allow multiple segment converters to have tailing segments #2061

Open CaselIT opened 2 years ago

CaselIT commented 2 years ago

The initial version of the converters that support matching multiple paths added in #1945 prevents adding additional segments after the path, meaning that something like /foo/{x:path}/bar is not supported.

Adding support to it is not trivial. A prototype find would be something like this: (edit: see reply https://github.com/falconry/falcon/issues/2061#issuecomment-1122812436 regarding how to prevent empty matches)

from dataclasses import dataclass, field
from itertools import count
from falcon.routing import CompiledRouter
from falcon.routing.converters import PathConverter

seq = count(1)

@dataclass
class Tmp:
    id: int = field(default_factory=seq.__next__)

    def on_get(self, req, resp, **kw):
        resp.media = {'id': self.id, 'kw': kw}

r = CompiledRouter()

def add(url, res):
    print(res, url.replace('{x}', '{x:path}'))
    r.add_route(url, res)

add('/foo/{x}', Tmp())
add('/foo/{x}/foo', Tmp())
add('/foo/{x}/{baz:int}', Tmp())
add('/foo/{x}/foo/bar', Tmp())
add('/foo/{x}/{baz:int}/baz', Tmp())

r._compile()
# print(r.finder_src)

def find(path, return_values, patterns, converters, params):
    converters = [PathConverter(), *converters]
    rv = {r.resource.id: r for r in return_values}
    path_len = len(path)
    if path_len > 0:
        if path[0] == 'foo':
            if path_len > 1:
                remaining_path = path[1:]
                remaining_path_len = len(remaining_path)
                if remaining_path_len > 1:
                    if remaining_path[-1] == 'bar':
                        if remaining_path[-2] == 'foo':
                            fragment = remaining_path[:-2]
                            field_value_1 = converters[0].convert(fragment)
                            if field_value_1 is not None:
                                params['x'] = field_value_1
                                return rv[4]
                    if remaining_path[-1] == 'baz':
                        fragment = remaining_path[-2]
                        field_value_2 = converters[1].convert(fragment)
                        if field_value_2 is not None:
                            fragment = remaining_path[:-2]
                            field_value_1 = converters[0].convert(fragment)
                            if field_value_1 is not None:
                                params['baz'] = field_value_2
                                params['x'] = field_value_1
                                return rv[5]
                if remaining_path_len > 0:
                    if remaining_path[-1] == 'foo':
                        fragment = remaining_path[:-1]
                        field_value_1 = converters[0].convert(fragment)
                        if field_value_1 is not None:
                            params['x'] = field_value_1
                            return rv[2]
                    fragment = remaining_path[-1]
                    field_value_2 = converters[1].convert(fragment)
                    if field_value_2 is not None:
                        fragment = remaining_path[:-1]
                        field_value_1 = converters[0].convert(fragment)
                        if field_value_1 is not None:
                            params['baz'] = field_value_2
                            params['x'] = field_value_1
                            return rv[3]
                field_value_1 = converters[0].convert(remaining_path)
                if field_value_1 is not None:
                    params['x'] = field_value_1
                    return rv[1]
                return None
            return None
        return None
    return None

r._find = find

def go(url):
    res = r.find(url)
    if res is None:
        print(url, res)
    resource, _, param, _ = res
    print(resource, param, url)

print('-' * 20)
go('/foo/foo')
go('/foo/bar')
go('/foo/bar/123')
go('/foo/bar/123/x')
go('/foo/bar/123/x/foo')
go('/foo/foo/bar/123/x/foo')
go('/foo/bar/123/x/foo/bar')
go('/foo/123/baz')
go('/foo/bar/123/baz')
go('/foo/bar/123/x/foo/bar/123/baz')

running the above yields

Tmp(id=1) /foo/{x:path}
Tmp(id=2) /foo/{x:path}/foo
Tmp(id=3) /foo/{x:path}/{baz:int}
Tmp(id=4) /foo/{x:path}/foo/bar
Tmp(id=5) /foo/{x:path}/{baz:int}/baz
--------------------
Tmp(id=2) {'x': ''} /foo/foo
Tmp(id=1) {'x': 'bar'} /foo/bar
Tmp(id=3) {'baz': 123, 'x': 'bar'} /foo/bar/123
Tmp(id=1) {'x': 'bar/123/x'} /foo/bar/123/x
Tmp(id=2) {'x': 'bar/123/x'} /foo/bar/123/x/foo
Tmp(id=2) {'x': 'foo/bar/123/x'} /foo/foo/bar/123/x/foo
Tmp(id=4) {'x': 'bar/123/x'} /foo/bar/123/x/foo/bar
Tmp(id=5) {'baz': 123, 'x': ''} /foo/123/baz
Tmp(id=5) {'baz': 123, 'x': 'bar'} /foo/bar/123/baz
Tmp(id=5) {'baz': 123, 'x': 'bar/123/x/foo/bar'} /foo/bar/123/x/foo/bar/123/baz

A decision to take would be what to do regarding supporting multiple "consume multiple segments" in a single url. Like what would /foo/{bar:path}/{baz:path} match in case of /foo/bar/baz? I think the issue would be there also if they two converter are separated by a literal segment or normal converted, since this template /foo/{bar:path}/x/{baz:path} would still be ambiguous for a route like /foo/bar/x/y/x/other/part

ref: https://github.com/falconry/falcon/pull/1945#issuecomment-1120176732

vytas7 commented 2 years ago

One unambiguous way slightly extending your prototype could be still supporting only one field of type path, but it could be placed anywhere in the template. Like /{foo:uuid}/{bar:path}/baz/{peb}/{cak}.

CaselIT commented 2 years ago

I personally think this should be a follow on PR, since it's not trivial to support a case like your example

vytas7 commented 2 years ago

Yes, absolutely, I meant splitting or simplifying this issue, not by further complicating #1945.

CaselIT commented 2 years ago

Oh ok, I misunderstood.

I don't feel supporting two or more multiple segment converter is that much harder than supporting tailing segments. In any case, even if we decide that the fist (or last) one "wins" I'm currently of the option that we should not allow them since it may be confusing on the users. I guess the main issue is the that path allows everything, if we do end up having a repath that would probably make more sense?

Still I agree with you that we could go one step at the time, and have fist a version that does not support multiple multi-segment converters

CaselIT commented 2 years ago

copying from the PR:

@kgriffs

There is a decision to take here: should a "consume multiple segments" converter be allowed to match 0 segments? the current implementation allows it.

Personally I would expect it to never match if there is no path to consume, because otherwise what's the point of using the path converter? Or at least, if there is another route registered for the exact path prefix, that one would take precedence over the route containing the path converter.

One could argue it my be useful to be able to handle /foo/bar/bang and /foo/bang` within the same resource controller, but that doesn't feel like a good API design to me (though it may be useful for re-implementing a legacy API?). So perhaps we could introduce an option to allow 0 segments but it is disabled by default? Or we introduce that along with #2061 ?


@vytas7 If I understand the problem right, IMHO, for the sake of consistency, the new converter should behave like fields behave now. They do match an empty segment after a trailing slash, and they do not otherwise.

Given the following test app,

import falcon

class Resource:
    def on_get(self, req, resp, resourceid):
        resp.media = {'path': req.path, 'resourceid': resourceid}

(app := falcon.App()).add_route('/api/{resourceid}', Resource())

Cf

$ xh http://localhost:8000/api/
HTTP/1.1 200 OK
Content-Length: 35
Content-Type: application/json

{
    "path": "/api/",
    "resourceid": ""
}

vs

$ xh http://localhost:8000/api
HTTP/1.1 404 Not Found
Content-Length: 26
Content-Type: application/json
Vary: Accept

{
    "title": "404 Not Found"
}

@CaselIT Also note that my question was only once we add support for additional parts after the path. Basically the question what if /foo/{bar:path}/baz should match /foo/baz or not.


@vytas7 Aha, right, I didn't realize the question was only about when we lift the requirement for a path segment to be last. Then maybe we don't need to make a decision right now?

In any case, I would perceive routing as strange and confusing if /foo/{bar:path}/baz matched /foo/baz, it wouldn't be consistent with the current behaviour where a field essentially needs a preceding / to match, and in this case the path field would rob /baz of that slash. So my vote is on "it shouldn't match".

CaselIT commented 2 years ago

Aha, right, I didn't realize the question was only about when we lift the requirement for a path segment to be last. Then maybe we don't need to make a decision right now?

In any case, I would perceive routing as strange and confusing if /foo/{bar:path}/baz matched /foo/baz, it wouldn't be consistent with the current behaviour where a field essentially needs a preceding / to match, and in this case the path field would rob /baz of that slash. So my vote is on "it shouldn't match".

The issue with shouldn't match is that /foo//baz should by the rule above, and it seems strange.

vytas7 commented 2 years ago

/foo//baz theoretically should match, yes, but IIRC we explicitly disallow two slashes in a row.

CaselIT commented 2 years ago

note that this is the current behaviour: /foo/{bar}/baz matches /foo//baz but not /foo/baz

vytas7 commented 2 years ago

Aha, it does... that's maybe oversight too, we do disallow a double slash in the templates at least, but maybe not when matching...

Well, in either case we should follow the current behaviour, and IMHO /foo/{bar:converter}/baz SHOULD NOT match /foo/baz REGARDLESS of the converter's type. But just IMHO :slightly_smiling_face:

CaselIT commented 2 years ago

I guess that in that sense we should keep this behavior, so the code above would not work. We probably need to check that remaining_path_len has at lest 1+ other segments, so that there is always one segment for the path converter

CaselIT commented 2 years ago

yes, replacing remaining_path_len > 1: with remaining_path_len > 2: and remaining_path_len > 0 with remaining_path_len > 1 produce the appropriate behaviour

CaselIT commented 2 years ago

Well, in either case we should follow the current behaviour, and IMHO /foo/{bar:converter}/baz SHOULD NOT match /foo/baz REGARDLESS of the converter's type. But just IMHO 🙂

I tend to agree, considering it's the current behaviour of the framework and changing that would entail quite a bit more work, since it's not easy to communicate. It may be unfortunate that // does produce different result, but it's still fine for me.

Should we document it?