aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.18k stars 2.02k forks source link

On redirects, middle URL with ø char gets parsed wrongly - leading to a 404 #10047

Open Alekky09 opened 1 day ago

Alekky09 commented 1 day ago

Describe the bug

Hello,

If I try to fetch this URL using aiohttp https://cornelius-k.dk/synsproeve/, it will redirect, eventually leading to a 404 when trying to get https://cornelius-k.dk/synspr\udcf8ve at the end of the chain.

Looks like the Location header will be parsed wrongly from b'https://cornelius-k.dk/synspr\xf8ve' which I found in the Response._raw_headers.

To Reproduce

Code block:

import aiohttp
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
}
async def fetch_url(url):
    async with aiohttp.ClientSession(headers=headers) as session:
        async with session.get(url) as response:
            for i in response.history:
                print(i.url)
                print(i._headers)
                print(i._raw_headers)
            return response.status
print(await fetch_url("https://cornelius-k.dk/synsproeve/"))

Final URL in the redirect chain will be https://cornelius-k.dk/synspr�ve instead of https://cornelius-k.dk/synsprøve and 404 will be yielded.

Expected behavior

Parsing URL in the redirects correctly and fetching the correct final URL.

Logs/tracebacks

Output of the code block:

https://cornelius-k.dk/synsproeve/
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:17 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synsproeve', 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:17 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'location', b'https://cornelius-k.dk/synsproeve'), (b'd-geo', b'US'))
https://cornelius-k.dk/synsproeve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'Location': 'http://cornelius-k.dk/synspr%C3%B8ve', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'location', b'http://cornelius-k.dk/synspr%C3%B8ve'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'd-geo', b'US'))
http://cornelius-k.dk/synspr%C3%B8ve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Security-Policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synspr\udcf8ve', 'D-Geo': 'US')>
((b'Server', b'nginx'), (b'Date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'Content-Length', b'0'), (b'Connection', b'keep-alive'), (b'Cache-Control', b'no-cache, no-store, must-revalidate'), (b'Expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'X-Content-Type-Options', b'nosniff'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'Content-Security-Policy', b"frame-ancestors 'self'"), (b'Location', b'https://cornelius-k.dk/synspr\xf8ve'), (b'D-Geo', b'US'))
(404, URL('https://cornelius-k.dk/synspr�ve'))

Python Version

3.9.20

aiohttp Version

3.11.7

multidict Version

6.1.0

propcache Version

0.2.0

yarl Version

1.17.1

OS

macOS

Related component

Client

Additional context

No response

Code of Conduct

bdraco commented 4 hours ago

Which setting are you using for requoting of redirects? ClientSession(requote_redirect_url=True) or ClientSession(requote_redirect_url=False) ?