falconry / falcon

The no-magic web data plane API and microservices framework for Python developers, with a focus on reliability, correctness, and performance at scale.
https://falcon.readthedocs.io/en/stable/
Apache License 2.0
9.51k stars 937 forks source link

Support ASGI `raw_path` (if present)? #1919

Open vytas7 opened 3 years ago

vytas7 commented 3 years ago

As discussed on https://github.com/falconry/falcon/issues/1895#issuecomment-843030339, ASGI supports raw_path as per the official spec, however, support is optional, and the scope is not guaranteed to contain it.

The raw path may contain additional information (such as escaped forward slashes %2F) that is lost when just using path.

Unless this is made optional, this is a breaking change wrt the existing 3.0 behaviour.

It's also an open question what to do with routing, should the router unescape the matched segments then (https://github.com/falconry/falcon/issues/1895#issuecomment-843067478)? While that would seem the most logical option, it would also be a burden that we probably don't want to impose on custom routers. Furthermore, it might incur a performance hit, if enabled by default.

So maybe this makes most sense as an optional improvement for the default CompiledRouter, controlled by router_options?

CaselIT commented 3 years ago

Is this needed if we support a :path argument type? The underlying issue is that we currently do not support it, other framework allow matching more than one path segment, so I personally would go in that direction, instead of doing something custom to some wsgi / asgi servers

vytas7 commented 3 years ago

@CaselIT There might be cases where third party clients may want to send an encoded URI segment or URL inside the path, and you might want to distinguish that from other routing, although obviously this is not a very frequent use case.

And this proposal is just for ASGI, where raw_path is part of the standard, just optional.

CaselIT commented 3 years ago

And this proposal is just for ASGI, where raw_path is part of the standard, just optional.

Oh ok, then I don't think this is a breaking change. I would not make the default an optional part of the standard. This may be supported optionally by the router then, but I think it would be behind a configuration flag

flying-sheep commented 3 years ago

One thing to mention: All three ASGI servers are built around raw_path, and do something like

return dict(
    ...
    path=unquote(path.decode("ascii")),
    raw_path=path,
    ...
)
vytas7 commented 3 years ago

Removing breaking-change assuming we'll approach this as an optional feature requiring CompiledRouter, if we decide this is a good idea™.

CaselIT commented 3 years ago

I was wondering if we cannot instead have these as an option of the request? Meaning that we allow an user to change what env/scope variable the request.path maps to.

vytas7 commented 3 years ago

We can -- but see @flying-sheep's comment on routing -- ideally we want to URL-decode the byte segments back to strings after routing.

Otherwise, if it is just for req.path, maybe my middleware recipes on https://github.com/falconry/falcon/issues/1895 are enough, just need to get them into the docs.

flying-sheep commented 3 years ago

Yes, a test case of how this would work is that

class Resource:
    def on_get(self, req, resp, id: str):
        resp.text = json.dumps(dict(id=id))

app = falcon.App(some_way_to_activate_this=True)
app.add_route('/test/{id}', Resource())

if __name__ == '__main__':
    httpd = simple_server.make_server('127.0.0.1', 8000, app)
    httpd.serve_forever()

replies the following way:

$ curl -Ss 'http://localhost:8000/test/Foo%2FBar'
{"id":"Foo/Bar"}

I think the above behavior is more intuitive than those two.

CaselIT commented 3 years ago

We can -- but see @flying-sheep's comment on routing -- ideally we want to URL-decode the byte segments back to strings after routing.

Not sure I follow. The path is a string, so I'm not sure we would need to do anything

vytas7 commented 3 years ago

If you use the raw path, your responder will get 'Foo%2FBar' as a matched parameter. Ideally, it should be 'Foo/Bar'.

But maybe we can add a recipe for working this around with a custom converter?

CaselIT commented 3 years ago

I would expect to get 'Foo%2FBar' if using the default argument, without any converter, since that's what in the path.

vytas7 commented 3 years ago

No, that's not right. IMHO at least.

As per HTTP RFCs, the path should be decoded, and the fact that we have access to an intermediate value before that process takes place doesn't change anything.