bottlepy / bottle

bottle.py is a fast and simple micro-framework for python web-applications.
http://bottlepy.org/
MIT License
8.38k stars 1.46k forks source link

Router unable to match wildcard filter in the middle of a URL #726

Open claire-lee opened 9 years ago

claire-lee commented 9 years ago

I'm trying to use the :path wildcard filter (as described here) to match part of my URL, which includes a forward slash character. For example, if I have the URL:

/resources/adfs89s7/container/asdf%2Fasdf/items

(where %2F is the forward slash), I want to match it to the route:

/resources/<resource_id>/container/<container_name:path>/items

However, this is currently returning a Not Found error. I have similar URLs where the wildcard filter is at the end of the URL, e.g.

/resources/<resource_id>/container/<container_name:path>

and that seems to work fine.

defnull commented 9 years ago

The two strings %2F and / are equivalent in an URI path. You can encode any character this way. a equals %61 for example.

foxbunny commented 9 years ago

@claire-lee What you are trying is probably better suited for regexes, because, afaik, :path wildcard will consume the rest of the path including escaped and unescaped slashes.

defnull commented 9 years ago

Oh, the :path filter works just fine. I cannot reproduce the error.

>>> import bottle
>>> app = bottle.Bottle()
>>> app.route('/resources/<resource_id>/container/<container_name:path>/items', callback=True)
>>> app.match(dict(PATH_INFO='/resources/adfs89s7/container/asdf%2Fasdf/items', REQUEST_METHOD='GET'))
(..., {'resource_id': 'adfs89s7', 'container_name': 'asdf%2Fasdf'})
>>> app.match(dict(PATH_INFO='/resources/adfs89s7/container/asdf/asdf/items', REQUEST_METHOD='GET'))
(..., {'resource_id': 'adfs89s7', 'container_name': 'asdf/asdf'})
tresni commented 8 years ago

The follow will 404 when using %2F, but works fine with a literal forward slash

http://something.com/test/123/asd -> 123/asd http://something.com/test/123%2Fasd -> 404

@route("/test/<test:re:.+>", method='GET')
def test(test):
    return test
defnull commented 8 years ago

I still cannot reproduce this bug in master or release-v12:

>>> import bottle
>>> app = bottle.Bottle()
>>> app.route('/test/<test:re:.+>', callback=True)
True
>>> app.match(dict(PATH_INFO='/test/123/asd', REQUEST_METHOD='GET'))
(<GET '/test/<test:re:.+>' True>, {'test': '123/asd'})
>>> app.match(dict(PATH_INFO='/test/123%2Fasd', REQUEST_METHOD='GET'))
(<GET '/test/<test:re:.+>' True>, {'test': '123%2Fasd'})

I tried the exact script you posted (plus import statements and a run() at the end). It works as intended:

$ curl http://127.0.0.1:8080/test/123/asd
123/asd
$ curl http://127.0.0.1:8080/test/123%2Fasd
123/asd

😕

HeroicKatora commented 4 years ago

I'm facing the same issue. As an explanation, URI encoding the slash should have the effect that it is not interpreted as a hierarchical component in contrast to a literal forward slash. The point of encoding is to remove the semantic meaning. That is I would expect that:

However given this service:

def print_all(*args, **kwargs): print(*args, kwargs)

import bottle
app = bottle.Bottle()
app.get("/test/123/:doc")(print_all)
app.run()
curl -o http://localhost:8080/test/123%2Fasd
# Server log: {'doc': 'asd'}, should return a 404
curl -o http://localhost:8080/test/123/asd
# Server log: {'doc': 'asd'}

It seems that the URI-encoding is already removed prior to the call to match hence those are both matched as /test/123/asd. This would explain all observations:

How this can be resolved is a difficult question. It certainly makes some sense to do URL decoding after having isolated the pure path component as this makes it far more ergonomic to match paths with special characters such as spaces or question marks. And indeed matching /test/doc?cheeky would match a document named doc?cheeky (encoded as /test/doc%3Fcheeky) and not a get with a query. The handler should most definitely get the URI decoded path components as well. However this scheme means that / is always semantically interpreted even if it shouldn't and there is no way to escape it for the client.