Open dmoklaf opened 3 years ago
The solution is: aiohttp should not requote HTTP redirect URL. Other quoting logic should be untouched.
The issue is easy and wais for a champion.
I submitted a PR that works in my program but ... raised interesting failures in the CI pipeline
E.g. exotic test cases where the redirection URL has linefeeds characters ("\r\n"), my patch transmits these characters unchanged.
The current code (before this PR) silently fixes these URLs with yarl by encoding these as any character.
For my patch to be valid, we should enrich it to check that the URL follows the HTTP standard. This becomes harder as we need a standard-compliant checker, and it may potentially break redirects for existing aiohttp users who were relying heavily on yarl transparent url corrections and didn't realize their redirection URLs were not standard-compliant...
In such case it may be simpler to just fix yarl (I am changing my mind on this as I realize we are pulling other potentially larger issues by trying to workaround yarl).
@asvetlov I have a concrete and trivial case of this
In [6]: import aiohttp
...: import asyncio
...:
...: async def main():
...:
...: async with aiohttp.ClientSession() as session:
...: async with session.get('https://releases.jfrog.io/artifactory/artifactory-pro-rpms/repodata/8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz') as response:
...:
...: print("Status:", response.status)
...: print("Content-type:", response.headers['content-type'])
...:
...: html = await response.text()
...: print("Body:", html)
...:
...: loop = asyncio.get_event_loop()
...: loop.run_until_complete(main())
Status: 403
Content-type: text/xml
Body: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access denied</Message></Error>
aiohttp processes the redirect location as the following - note the "X-Artifactory-artifactPath" parameter
https://releases-cdn.jfrog.io/filestore/8c/8c87521e43dbed223c90e23922c0c4bbe7159112?response-content-type=application/x-gzip&response-content-disposition=attachment%3Bfilename%3D%228c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz%22&x-jf-traceId=da959416009abe79&X-Artifactory-repositoryKey=artifactory-pro-rpms&X-Artifactory-projectKey=default&X-Artifactory-artifactPath=repodata/8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz&X-Artifactory-username=anonymous&Expires=1657248968&Signature=MA21YnR3A8BncqlPFTcELv15ndn2B6yNQpaag2JuuPiWgiwjYnGHlbQkDbXp6Gk3ygrKOPvUETeb-gSzv6g64kFVnNzYzDxMRdV71nIP2YRWkufG-2R9AGSA9OtAevmAWYG-wmGazZt0L6VqX-4XDfpQPoVG5RITCMzQnQ3W~XbX8lB2AhliU0GI8QNOFzVKB8bAiFIFjh0HBV-P~~DlRaOn0ouKkdS-6paLjDq7NPEmlXrt2A~oOd5NvwViRpUNSsbov6WugJHPMMxPb5wj3B7hISYfwN6~6TWgbZ8Y6o3GXV-3zhxN4idpkFuvF90d2Aoep~gmeGKUysk3EohzVQ__&Key-Pair-Id=APKAJ6NHFWMVU3M6DPBA
Whereas wget, httpie and others process the redirect as
https://releases-cdn.jfrog.io/filestore/8c/8c87521e43dbed223c90e23922c0c4bbe7159112?response-content-type=application/x-gzip&response-content-disposition=attachment%3Bfilename%3D%228c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz%22&x-jf-traceId=e4edec532ab037c3&X-Artifactory-repositoryKey=artifactory-pro-rpms&X-Artifactory-projectKey=default&X-Artifactory-artifactPath=repodata%2F8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz&X-Artifactory-username=anonymous&Expires=1657249015&Signature=dwsfXduAz0Avg62JQfIgXuQGP2cLv6AExW4u9gR3B0wzOu5R599u~satGkD6yLIzB4Y1uBCLhU6IO7b-ovr0wT9AJTxMaE-eiF2awcaj2eGzXpFibJVoUuzhsG3CW5koby8cytTThxRYewn1nACLna58og14o~EkXIgA3~c32y5ltR1rtJDyclQ~wpQNTpc4vX4uk2hr5z0lolmCGy0fgLxmuiFkzNmsHFAUbo1qqb8kJF~SAE-l4erLPAIlyIuDpAl6myyJ1cHmS-dolXiz8M6yNel-sJY5jaOH-iAlZoZm9PHdqmKbMqrzdgzLS2emZF5zER9g068lY~7fLqgqrw__&Key-Pair-Id=APKAJ6NHFWMVU3M6DPBA
and it works fine
Perhaps yarl
could have a "max_compatibility" flag which adopts the more aggressive encoding behavior?
E.g. exotic test cases where the redirection URL has linefeeds characters ("\r\n"), my patch transmits these characters unchanged.
Not exotic really but request smuggling attack prevention.
Perhaps
yarl
could have a "max_compatibility" flag which adopts the more aggressive encoding behavior?
Per Andrew's comment, maybe aiohttp could make use of the raw
argument. But dunno, I haven't checked the code.
@dralley Have you tried setting requote_redirect_url=False
?
@webknjaz This works, thank you.
It does beg the question though - why does aiohttp default to rewriting the redirect URLs it receives, rather than defaulting to not doing so? All things considered it does still seem like surprising behavior.
I think it has to do with the fact that yarl.URL
is used for internal representation of URLs in aiohttp and it so happens that multiple valid inputs may be equivalent. It's also a way to be sure that malicious inputs don't get forwarded, I suppose.
I did some Git archeology and found that this issue was first reported in #1474 against aiohttp v1.1.4. Then, with @fafhrd91's patch https://github.com/aio-libs/aiohttp/commit/ed9bf2b2ff2803757e1dd6fad6bf5d2101dec799 this setting appeared in v2.1.0, followed by a later change by @asvetlov β https://github.com/aio-libs/aiohttp/pull/3436 β that deprecated the corresponding attribute mutability. If anybody's able to recall the original motivation, it'd probably be @fafhrd91.
π Describe the bug Aiohttp server redirects ( triggered through "raise HTTPFound(...)") fail sometimes, depending on the final webserver targeted by the redirect, when the URL includes a question mark "?" in a query string argument (e.g., when this query argument is itself a URL), even when this question mark is URL-encoded.
π‘ To Reproduce On an aiohttp server send back a redirect to the client with "raise HTTPFound(...)", to a URL (on another final web server) that includes a query string that includes another URL with an encoded question mark. Some final web servers will reject that URL due to the second question mark, and the network trace will show that the initial aiohttp webserver had transformed the redirection URL, decoding the question mark in the query string.
Further analysis revealed this bug is due to 2 things: (1) The aiowebserver decodes and reencodes the URL through the yarl library (2) The yarl library has a bug with some query string characters that it does not escape (which is compliant with the web standards but not compliant with many webservers expectations): https://github.com/aio-libs/yarl/issues/245
Although the yarl maintainers may fix (2), this bug has been opened for 2 years and, more importantly, the usefulness of (1) is questioned: why not letting the user be responsible for the URL it submits? Reencoding it exposes him to expectation mismatchs (I never expected the web server to silently transforms my URL) and such potential bugs.
π‘ Expected behavior The redirect should work flawlessly.
π Logs/tracebacks
π Your version of the Python 3.7.8
π Your version of the aiohttp/yarl/multidict distributions
π Additional context