Open ldebortoli opened 2 years ago
Can be fixed by using ClientSession(requote_redirect_url=False)
I don't know what would be a better solution here, interested to hear core devs opinion.
Well, that certainly looks like the fix.
I guess the question is whether it makes sense that it requotes by default... Is there a security concern here? @webknjaz
For reference, the parameter was added in response to #1474. It appears there was no discussion/consideration to changing the default behaviour.
Not sure, I wouldn't change this — there's always going to be somebody for whom the current default is better and vice versa.
I'm just thinking that for the average user, they expect the redirect to just work correctly. But, requoting in some cases breaks the redirection, and the user doesn't know why and reports a bug like this one. If there's no security issue here, then I think it makes more sense to default to the approach that's expected to work (and must be what requests and all browsers do).
@Dreamsorcerer there might be security issue potential with the Location
header, but I don't remember exactly off the top of my head. It's probably dangerous for apps that reexpose/reinterpret the resulting URL, maybe proxies (request smuggling et al).
Yeah, I'm just struggling to think how at the moment. If we are going to accept an arbitrary URL and follow the redirection, I'm just not sure what difference it can make by quoting the arguments. It can send us to an attacker URL or whatever regardless...
One of the past vulnerabilities allowed piggybacking and additional HTTP request through pipelining, with an up being behind a proxy that interpreted the request differently and split it into two. That was a server-side thing, though. It's usually obvious after such thing happens, and doesn't seem probable before 🤷♂️
If the requested redirect URL is equal to requested by user, I think there is no point in going to that URL in loop until aiohttp.client_Exceptions.TooManyRedirects
is raised.
I assume, in this case a log warning or another exception can be raised, reporting that redirect URL is the same as requested one (or any other in the redirections chain) and suggesting using requote_redirect_url=False
.
If the requested redirect URL is equal to requested by user, I think there is no point in going to that URL in loop until
aiohttp.client_Exceptions.TooManyRedirects
is raised.
Well, I think this behaviour is actually consistent with browsers, which will try around 20 redirects before giving up, even if they are circular.
Also, the redirect URL is different from the URL we actually requested, it's only after we requote the URL that it becomes the same.
Maybe, if we aren't going to change the default, we could atleast add a message to that exception that suggests that requote=False might be what they are looking for. Though this would only help a small portion of people, as most people are probably getting redirected to a wrong page (e.g. 404 or home page) that doesn't redirect again.
Describe the bug
I'm scrapping the marvel fandom site using tasks of asyncio. Every time that I try to get the HTML document from a URL that contains the '&' symbol I have the same issue:
For example with: https://marvel.fandom.com/wiki/Amazing_Spider-Man_&_Silk:_The_Spider(fly)_Effect_Vol_1
This link redirects to another link where the & symbol is replaced with the % encoding (& = %26). So the actual URL of the page is https://marvel.fandom.com/wiki/Amazing_Spider-Man_%26_Silk:_The_Spider(fly)_Effect_Vol_1
Then I run the following code and I get an error
To catch the exception I made the following:
And as you can see the code redirects many times until max_amount_redirects is reached. The problem is that the actual URL obtained in the redirection ('Location': 'https://marvel.fandom.com/wiki/Amazing_Spider-Man_%26_Silk:_The_Spider(fly)_Effect_Vol_1') apparently is again parsed with % code and %26 is replaced by &.
The same error occurs if I use https://marvel.fandom.com/wiki/Amazing_Spider-Man_%26_Silk:_The_Spider(fly)_Effect_Vol_1 in the first time, %26 is replaced by & and redirects until limit is reached. To make it work I had to replace & with %%26 so it parses to a literal "%26" in the url.
To Reproduce
Expected behavior
Redirects correctly
Logs/tracebacks
Python Version
aiohttp Version
multidict Version
yarl Version
OS
Manjaro, Linux
Related component
Client
Additional context
No response
Code of Conduct