aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.18k stars 2.03k forks source link

Some aiohttp web server redirects fail #5319

Open rgeronimi opened 3 years ago

rgeronimi commented 3 years ago

🐞 Describe the bug Aiohttp server redirects ( triggered through "raise HTTPFound(...)") fail sometimes, depending on the final webserver targeted by the redirect, when the URL includes a question mark "?" in a query string argument (e.g., when this query argument is itself a URL), even when this question mark is URL-encoded.

πŸ’‘ To Reproduce On an aiohttp server send back a redirect to the client with "raise HTTPFound(...)", to a URL (on another final web server) that includes a query string that includes another URL with an encoded question mark. Some final web servers will reject that URL due to the second question mark, and the network trace will show that the initial aiohttp webserver had transformed the redirection URL, decoding the question mark in the query string.

Further analysis revealed this bug is due to 2 things: (1) The aiowebserver decodes and reencodes the URL through the yarl library (2) The yarl library has a bug with some query string characters that it does not escape (which is compliant with the web standards but not compliant with many webservers expectations): https://github.com/aio-libs/yarl/issues/245

Although the yarl maintainers may fix (2), this bug has been opened for 2 years and, more importantly, the usefulness of (1) is questioned: why not letting the user be responsible for the URL it submits? Reencoding it exposes him to expectation mismatchs (I never expected the web server to silently transforms my URL) and such potential bugs.

πŸ’‘ Expected behavior The redirect should work flawlessly.

πŸ“‹ Logs/tracebacks

πŸ“‹ Your version of the Python 3.7.8

πŸ“‹ Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages
Requires: async-timeout, chardet, yarl, multidict, attrs, typing-extensions
Required-by: slackclient, gcsfs, aiosocks, aiohttp-jinja2
$ python -m pip show multidict
Name: multidict
Version: 5.0.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages
Requires: 
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.6.3
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages
Requires: multidict, idna, typing-extensions
Required-by: aiohttp

πŸ“‹ Additional context

asvetlov commented 3 years ago

The solution is: aiohttp should not requote HTTP redirect URL. Other quoting logic should be untouched.

The issue is easy and wais for a champion.

rgeronimi commented 3 years ago

I submitted a PR that works in my program but ... raised interesting failures in the CI pipeline

E.g. exotic test cases where the redirection URL has linefeeds characters ("\r\n"), my patch transmits these characters unchanged.

The current code (before this PR) silently fixes these URLs with yarl by encoding these as any character.

For my patch to be valid, we should enrich it to check that the URL follows the HTTP standard. This becomes harder as we need a standard-compliant checker, and it may potentially break redirects for existing aiohttp users who were relying heavily on yarl transparent url corrections and didn't realize their redirection URLs were not standard-compliant...

In such case it may be simpler to just fix yarl (I am changing my mind on this as I realize we are pulling other potentially larger issues by trying to workaround yarl).

dralley commented 2 years ago

@asvetlov I have a concrete and trivial case of this

In [6]: import aiohttp
   ...: import asyncio
   ...: 
   ...: async def main():
   ...: 
   ...:     async with aiohttp.ClientSession() as session:
   ...:         async with session.get('https://releases.jfrog.io/artifactory/artifactory-pro-rpms/repodata/8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz') as response:
   ...: 
   ...:             print("Status:", response.status)
   ...:             print("Content-type:", response.headers['content-type'])
   ...: 
   ...:             html = await response.text()
   ...:             print("Body:", html)
   ...: 
   ...: loop = asyncio.get_event_loop()
   ...: loop.run_until_complete(main())
Status: 403
Content-type: text/xml
Body: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access denied</Message></Error>

aiohttp processes the redirect location as the following - note the "X-Artifactory-artifactPath" parameter

https://releases-cdn.jfrog.io/filestore/8c/8c87521e43dbed223c90e23922c0c4bbe7159112?response-content-type=application/x-gzip&response-content-disposition=attachment%3Bfilename%3D%228c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz%22&x-jf-traceId=da959416009abe79&X-Artifactory-repositoryKey=artifactory-pro-rpms&X-Artifactory-projectKey=default&X-Artifactory-artifactPath=repodata/8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz&X-Artifactory-username=anonymous&Expires=1657248968&Signature=MA21YnR3A8BncqlPFTcELv15ndn2B6yNQpaag2JuuPiWgiwjYnGHlbQkDbXp6Gk3ygrKOPvUETeb-gSzv6g64kFVnNzYzDxMRdV71nIP2YRWkufG-2R9AGSA9OtAevmAWYG-wmGazZt0L6VqX-4XDfpQPoVG5RITCMzQnQ3W~XbX8lB2AhliU0GI8QNOFzVKB8bAiFIFjh0HBV-P~~DlRaOn0ouKkdS-6paLjDq7NPEmlXrt2A~oOd5NvwViRpUNSsbov6WugJHPMMxPb5wj3B7hISYfwN6~6TWgbZ8Y6o3GXV-3zhxN4idpkFuvF90d2Aoep~gmeGKUysk3EohzVQ__&Key-Pair-Id=APKAJ6NHFWMVU3M6DPBA

Whereas wget, httpie and others process the redirect as

https://releases-cdn.jfrog.io/filestore/8c/8c87521e43dbed223c90e23922c0c4bbe7159112?response-content-type=application/x-gzip&response-content-disposition=attachment%3Bfilename%3D%228c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz%22&x-jf-traceId=e4edec532ab037c3&X-Artifactory-repositoryKey=artifactory-pro-rpms&X-Artifactory-projectKey=default&X-Artifactory-artifactPath=repodata%2F8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz&X-Artifactory-username=anonymous&Expires=1657249015&Signature=dwsfXduAz0Avg62JQfIgXuQGP2cLv6AExW4u9gR3B0wzOu5R599u~satGkD6yLIzB4Y1uBCLhU6IO7b-ovr0wT9AJTxMaE-eiF2awcaj2eGzXpFibJVoUuzhsG3CW5koby8cytTThxRYewn1nACLna58og14o~EkXIgA3~c32y5ltR1rtJDyclQ~wpQNTpc4vX4uk2hr5z0lolmCGy0fgLxmuiFkzNmsHFAUbo1qqb8kJF~SAE-l4erLPAIlyIuDpAl6myyJ1cHmS-dolXiz8M6yNel-sJY5jaOH-iAlZoZm9PHdqmKbMqrzdgzLS2emZF5zER9g068lY~7fLqgqrw__&Key-Pair-Id=APKAJ6NHFWMVU3M6DPBA

and it works fine

dralley commented 2 years ago

Perhaps yarl could have a "max_compatibility" flag which adopts the more aggressive encoding behavior?

webknjaz commented 2 years ago

E.g. exotic test cases where the redirection URL has linefeeds characters ("\r\n"), my patch transmits these characters unchanged.

Not exotic really but request smuggling attack prevention.

webknjaz commented 2 years ago

Perhaps yarl could have a "max_compatibility" flag which adopts the more aggressive encoding behavior?

Per Andrew's comment, maybe aiohttp could make use of the raw argument. But dunno, I haven't checked the code.

webknjaz commented 2 years ago

@dralley Have you tried setting requote_redirect_url=False?

dralley commented 2 years ago

@webknjaz This works, thank you.

It does beg the question though - why does aiohttp default to rewriting the redirect URLs it receives, rather than defaulting to not doing so? All things considered it does still seem like surprising behavior.

webknjaz commented 2 years ago

I think it has to do with the fact that yarl.URL is used for internal representation of URLs in aiohttp and it so happens that multiple valid inputs may be equivalent. It's also a way to be sure that malicious inputs don't get forwarded, I suppose.

I did some Git archeology and found that this issue was first reported in #1474 against aiohttp v1.1.4. Then, with @fafhrd91's patch https://github.com/aio-libs/aiohttp/commit/ed9bf2b2ff2803757e1dd6fad6bf5d2101dec799 this setting appeared in v2.1.0, followed by a later change by @asvetlov β€” https://github.com/aio-libs/aiohttp/pull/3436 β€” that deprecated the corresponding attribute mutability. If anybody's able to recall the original motivation, it'd probably be @fafhrd91.