Cadair / parfive

An asyncio based parallel file downloader for Python 3.8+
https://parfive.readthedocs.io/
MIT License
51 stars 24 forks source link

Option to use external download manager (wget or curl) #147

Open wxguy opened 7 months ago

wxguy commented 7 months ago

This is the kind of package I was looking for. API is simple to use.

I have an issue that most often parfive fails to download files from URLs. This do happen after spending long time in downloading. To overcome this, I propose either of following options:

  1. Provide resume option in API.
  2. Provide external downloader engine such as wget or curl (with keyword like 'engine=wget') which works butiful.
Cadair commented 7 months ago

Hi :wave: glad you find parfive useful.

You can retry failed downloads with retry but I don't think that it will resume partial downloads (that would be a good feature to add though #10 ).

I don't think I will ever support alternative download engines, especially not ones which require shelling out to a binary. That sounds like it would be a very different code path and quite hard to fit into the same API. There is #143 where I am considering ditching aiohttp.

Can you elaborate on the issues you are facing? Have you looked into if any aiohttp settings could make your particular downloads more reliable?

wxguy commented 7 months ago

Thank you for your response.

My use case is that I have to download approx 100+ files from various resources in my application to create the final product. Even if one file is not downloaded or missing, the final product won't be created. One way to resolve the issue is to download only the missing or resume from a partial download. While it is not an issue to re-download all files again, the time taken and overall size of downloaded files would be a huge down for me.

I don't think I will ever support alternative download engines, especially not ones which require shelling out to a binary.

I understand the reason why you won't be implementing this feature.

I have limited exposure to aiohttp library. However, reading from various web resources indicates that it is possible to implement a partial resume of download. One good example is given here https://stackoverflow.com/questions/58448605/download-file-with-resume-capability-using-aiohttp-and-python for aiohttp and https://stackoverflow.com/questions/22894211/how-to-resume-file-download-in-python for general downloader engine. Additional references are given here as well https://stackoverflow.com/questions/12243997/how-to-pause-and-resume-download-work.

If you can implement this feature, it would be of great help and excellent value addition for parfive as I have not seen any other similar libraries giving this feature.

Thank you in advance.

Cadair commented 7 months ago

One way to resolve the issue is to download only the missing

retry should do this for you.

dl = Downloader()
dl.enqueue_file("http://data.sunpy.org/sample-data/predicted-sunspot-radio-flux.txt", path="./")
files = dl.download()
if files.errors:
    dl.retry(files)

resuming partial downloads would be good, but I don't have time to work on that in the near term. If you or someone else can pick it up I would be happy to help.

wxguy commented 7 months ago

retry should do this for you.

Did this already. But still a few partial downloads are left which are skipped by parfive.

Anyway, thank you for listening to the voice. Hoping to see new update with this feature in the near future.

Thank you.

Cadair commented 7 months ago

But still a few partial downloads are left which are skipped by parfive.

This feels like a bug, I don't suppose you have a way to reliably reproduce this? Is the issue that the failed download files are not getting deleted correctly?