eastgenomics / Genetics_Ark

Django source code for Genetics Ark and associated apps
1 stars 0 forks source link

Parallelise search step to shorten the run time of find_dx_data #73

Closed corbin-chris closed 4 months ago

corbin-chris commented 4 months ago

It currently takes > 10 minutes to run the script which updates the list of available files. Parallelising some of the search steps could improve run times. An example is here (dx_find_in_parallel): https://github.com/eastgenomics/dias_reports_bulk_reanalysis/blob/c72dea0b94893a4a8079178c0ff455f7dd4917ae/bin/utils/dx_manage.py#L153

It may also be worth using the list of 002 project IDs, instead of working on concurrent.futures