falconry / falcon

The no-magic web data plane API and microservices framework for Python developers, with a focus on reliability, correctness, and performance at scale.
https://falcon.readthedocs.io/en/stable/
Apache License 2.0
9.51k stars 937 forks source link

When following raw URL recipe based on RAW_URI, it breaks routing in TestClient #2157

Closed liborjelinek closed 1 year ago

liborjelinek commented 1 year ago

I'm building REST API with many %-encoded slashes in URI (i.e., as data, not path separator). For example, request URI /book/baking-book/document/sweet%2Fcheescake.md will never reach the route declared as /book/{book}/document/{path}.

I understand that's not Falcon's fault, but the WSGI server, which "eats" %-encoding before providing it to Falcon in the PATH_INFO CGI variable. Thank you for the well-explained reason in FAQ and decoding raw URL recipe based on Gunicorn's RAW_URI non-standard variable.

However, if I follow this recipe, it breaks all tests because TestClient hard-code RAW_URI to /, so no URI will route properly anymore.

vytas7 commented 1 year ago

Hi @mattwarrick! I agree this is a bug as TestClient ought to either correctly encode the RAW_URI CGI variable or omit it altogether.

We'll get this fixed, but for the time being you can work around the problem by manually providing the correct rendition of RAW_URI via the extras parameter when simulating requests, e.g.:

import falcon
import falcon.testing
import falcon.uri

class RawPathComponent:
    def process_request(self, req, resp):
        raw_uri = req.env.get('RAW_URI') or req.env.get('REQUEST_URI')

        # NOTE: Reconstruct the percent-encoded path from the raw URI.
        if raw_uri:
            req.path, _, _ = raw_uri.partition('?')

class URLResource:
    def on_get(self, req, resp, url):
        # NOTE: url here is potentially percent-encoded.
        url = falcon.uri.decode(url)

        resp.media = {'url': url}

    def on_get_status(self, req, resp, url):
        # NOTE: url here is potentially percent-encoded.
        url = falcon.uri.decode(url)

        resp.media = {'cached': True}

app = falcon.App(middleware=[RawPathComponent()])
app.add_route('/cache/{url}', URLResource())
app.add_route('/cache/{url}/status', URLResource(), suffix='status')

def test_raw_uri():
    client = falcon.testing.TestClient(app)

    resp1 = client.get(
        '/cache/http%3A%2F%2Ffalconframework.org',
        extras={'RAW_URI': '/cache/http%3A%2F%2Ffalconframework.org'},
    )
    assert resp1.json == {'url': 'http://falconframework.org'}

    resp2 = client.get(
        '/cache/http%3A%2F%2Ffalconframework.org/status',
        extras={'RAW_URI': '/cache/http%3A%2F%2Ffalconframework.org/status'},
    )
    assert resp2.json == {'cached': True}

If you have a large testing suite, and it is impractical to apply this treatment to every invocation, you could probably mitigate the impact by first checking the raw path for '%2F' in the middleware itself (and/or alternatively restrict the effect by other parts of the path such as prefix):

        # NOTE: Reconstruct the percent-encoded path from the raw URI.
        if raw_uri and '%2F' in raw_uri:
            req.path, _, _ = raw_uri.partition('?')
liborjelinek commented 1 year ago

I understand what adding "good first issue" and "needs contributor" tags means to me :-) I'll try to prepare PR fixit it.

vytas7 commented 1 year ago

It does not necessarily mean you, but you are of course very welcome to tackle this one :smiling_imp:

vytas7 commented 1 year ago

Resolved fixed in #2159

liborjelinek commented 11 months ago

A follow-up for others that might stumble upon this issue in future: I have collected together steps required for creating URL-encoded URLs in Falcon in a blog post.