Cadair / parfive

An asyncio based parallel file downloader for Python 3.8+
https://parfive.readthedocs.io/
MIT License
51 stars 24 forks source link

Redirects are not resolved #125

Closed dreamflasher closed 1 year ago

dreamflasher commented 1 year ago

Browsers or wget can download this file: https://traffic.libsyn.com/secure/revolutionspodcast/7.17-_The_Five_Days_of_Milan_Master.mp3?dest-id=159998

Parfive fails to downoad this with the following error:

[<parfive.results.Error object at 0x000002823035F360>
https://traffic.libsyn.com/secure/revolutionspodcast/7.17-_The_Five_Days_of_Milan_Master.mp3?dest-id=159998,
Download Failed: https://traffic.libsyn.com/secure/revolutionspodcast/7.17-_The_Five_Days_of_Milan_Master.mp3?dest-id=159998 with error <ClientResponse(https://content.libsyn.com/p/8/9/4/8947bc51039d1a6d/7.17-_The_Five_Days_of_Milan_Master.mp3?c_id=17921886&cs_id=17921886&destination_id=159998&response-content-type=audio/mpeg&Expires=1673986867&Signature=TFDZPtIhhSxftBYIzdK6JWhKa30AGtHUjCDHnHhIeHDNElKkPWcWIIR6oG1JmTvt8Xm~0CPf-ASzjjwSu14gE82JTy5poLXZluZ50noJXS3Pre7qIsY7zyc19Kkr~Mt1cMrSvuuk0C1Ps2oRL72q8cxm2uHR9fsN3T4lLMJtwFgytDez6h9-eUieZU4YoYDvCjLS4nI1jWKPGYx47HYfUR2VmcRVHtrc7~Jhet-JFKeyez72stNgZAcJlK9wnZ2pZdEYa2F49xipnEdfGx0d3NSY1rUpW729lA7nJY1EacMMj~C4wXZ6VSgR2mHmYY9o7c3RmzSgk9RFoGhtUZ-61Q__&Key-Pair-Id=K1YS7LZGUP96OI) [403 Forbidden]>
<CIMultiDictProxy('Server': 'CloudFront', 'Date': 'Tue, 17 Jan 2023 19:24:59 GMT', 'Content-Type': 'text/xml', 'Content-Length': '110', 'Connection': 'keep-alive', 'X-Cache': 'Error from cloudfront', 'Via': '1.1 2f927b8fefe61ec7dd1d6dda3df37d18.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'TXL50-P1', 'X-Amz-Cf-Id': 'Pr6RKop3Qxo9Y4scHoFSWwmfklfRuRybLlUy5l8of3ImBjC_rECnBQ==', 'Vary': 'Origin')>
]

Code to reproduce:

from parfive import Downloader

if __name__ == "__main__":
    dl = Downloader(max_splits=1, overwrite=True)
    dl.enqueue_file("https://traffic.libsyn.com/secure/revolutionspodcast/7.17-_The_Five_Days_of_Milan_Master.mp3?dest-id=159998", ".", "test.mp3")
    res = dl.download()
    print(res.errors)
dreamflasher commented 1 year ago

Apparently kwargs to enqueue_file are pushed down to aiohttp.session.get, so I tried:

dl.enqueue_file("https://traffic.libsyn.com/secure/revolutionspodcast/7.17-_The_Five_Days_of_Milan_Master.mp3?dest-id=159998", ".", "test.mp3", allow_redirects=True)

Yet, this yields the same error. Apart from it not working, I suggest to include allow_redirects=True per default in parfive.

dreamflasher commented 1 year ago

I was able to solve this: Not resolving redirects correctly is a bug in aiohttp which can be fixed by using requote_redirect_url=False in the ClientSession config. I'll create a PR to change this in parfive, although the default should be changed in aiohttp.