When downloading sequences for some samples (e.g., metagenomes) the download appears very slow, as compared to smaller sized datasets.
Steps to reproduce:
Try to fetch sequences for ID ERR1700893 and observe the time it takes.
Expected behaviour:
The size of this dataset is approx. 28 GB - it should be a matter of half an hour to an hour to fetch (depends on connection speed).
Actual behaviour:
It takes hours (don't know exactly, didn't wait for it to finish).
The problem is that in case of large datasets prefetch silently fails as the default allowed max. size is 20GB. fasterq-dump then takes over but is just much slower. This can easily be fixed by adjusting the max-size param of prefetch to unlimited to allow downloads of any data. See here for some more info.
When downloading sequences for some samples (e.g., metagenomes) the download appears very slow, as compared to smaller sized datasets.
Steps to reproduce: Try to fetch sequences for ID
ERR1700893
and observe the time it takes.Expected behaviour: The size of this dataset is approx. 28 GB - it should be a matter of half an hour to an hour to fetch (depends on connection speed).
Actual behaviour: It takes hours (don't know exactly, didn't wait for it to finish).
The problem is that in case of large datasets
prefetch
silently fails as the default allowed max. size is 20GB.fasterq-dump
then takes over but is just much slower. This can easily be fixed by adjusting themax-size
param ofprefetch
to unlimited to allow downloads of any data. See here for some more info.