Public-Health-Bioinformatics / cpo-pipeline

An analysis pipeline for the purpose of investigating Carbapenemase-Producing Organisms.
MIT License
1 stars 2 forks source link

Concurrent NCBI Refseq Downloads #45

Closed dfornika closed 5 years ago

dfornika commented 5 years ago

NCBI limits downloads from refseq to 3 per second.

We download 'candidate' refseq plasmid reference files here: https://github.com/Public-Health-Bioinformatics/cpo-pipeline/blob/6e5a65c045e3bb5a2ac5bfd2e7cdb7b9aafa1c94/cpo_pipeline/plasmids/strategies.py#L330

For each sample, we pause 2 seconds before each download. But if many samples are analyzed simultaneously it's possible that the 3 download per second limit could be exceeded.

Possible solution: Catch error, wait a few seconds and retry download.