ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
258 stars 34 forks source link

Add download retry - `fastq-dump` timeout error 3 #63

Closed ababaian closed 4 years ago

ababaian commented 4 years ago

ERROR

Occasionally (20/800 samples) fastq-dump throws an error that it has 'timed out' for retrieving information about an SRA entry.

SRR11479046 split_err 2020-04-28 02:19:34.819629 2020-04-28 02:23:08.818504 i-059b34ac3dad98dba-2

Gave error:

02:21:58
+ parallel --block 100M --pipe -N1000000 /home/serratus/s3_cp_formatted.sh s3://tf-serratus-work-20200428020557609100000001/fq-blocks/SRR11412326/SRR11412326.1.fq.%010d '{#}'

02:22:08
2020-04-28T02:22:08 fastq-dump.2.10.4 err: connection busy while creating file within network system module - error with https open 'https://sra-download.ncbi.nlm.nih.gov/traces/sra52/SRR/011210/SRR11479046'

02:23:08
2020-04-28T02:23:08 fastq-dump.2.10.4 err: timeout exhausted while creating file within network system module - error with https open 'https://sra-download.ncbi.nlm.nih.gov/traces/sra52/SRR/011210/SRR11479046'

02:23:08
2020-04-28T02:23:08 fastq-dump.2.10.4 err: timeout exhausted while creating file within network system module - failed to open 'SRR11479046'

02:23:08
fastq-dump (PID 2808) quit with error code 3

Desired behavior

If this particular error arises with error code 3 on fastq-dump either set the download of this entry to "retry" state or throw a wait 20s and retry on the same worker node without failing the entire download and going to split_err. This should be retried a maximum of 2 times (3 tries total) before going to fail state.

Location

Around here in the script

ababaian commented 4 years ago

This is bypassed, in the thousands of runs I just accept some will fail and upon completion of the first "try", all split-err accessions are reset manually to new to try again.