Open ross-spencer opened 6 years ago
Hi @sevein I wondered if you had an opinion about the above? Further, with our own async module, would that impact doing something like from requests import async
?
You could do what you're describing. You'd need to remember that we're using gevent so the standard library is already patched and it's cooperative. E.g. if you decided to implement a solution based on concurrent.futures.ThreadPoolExecutor
(also available in py2 via futures
) it's good to know that you're not really using threads but greenlets. It shouldn't change much but it's worth understanding the difference.
Be aware that it's an area where we may not be doing our best to handle errors properly, e.g. what would happen if the operation fails / raises an exception? Are we doing proper error handling? Should we update AsyncManager so it can retry operations? Are we reporting the error to the user? These are questions that you may want to ask yourself while planning to work on performance improvements.
Please describe the problem you'd like to be solved.
I would like to see this portion of the code corrected to enable concurrent downloading of a Dataverse payload: https://github.com/artefactual/archivematica-storage-service/blob/9e6f97392042997bfd7ee251308e0708f514860e/storage_service/locations/models/dataverse.py#L253-L275
The speed at which one-by-one downloading happens at present impacts user efficacy or at least the speed of.
Describe the solution you'd like to see implemented.
The python requests library supports concurrent downloads, we could try a solution such as this outlined on Stack Overflow: https://stackoverflow.com/a/9189249
Describe alternatives you've considered.
Alternative libraries or mechanisms which we are already doing this in Archivematica might exist.
Additional Context.
Async requests: http://docs.python-requests.org/en/v0.10.6/user/advanced/#asynchronous-requests