ewels / sra-explorer

Web application to explore the Sequence Read Archive.
https://sra-explorer.info/
GNU General Public License v2.0
203 stars 29 forks source link

cURL download for fastq files is smaller than expected. #39

Closed blizard-wizard closed 1 year ago

blizard-wizard commented 1 year ago

I used cURL to download some fastqs from SRA (SRP252588). 8 files in the study and I want the last 4. Grabbed the bash scripts generated by SRA explorer and setup a loop to download the list of files in the background. I've done this many times before with great success. Anyway... All 4 files appear to have downloaded correctly, but the last one (SRR11296682) only shows as 22 bytes. When I download the same file through Chrome I get a 394mb file.

Any ideas? Is there a way to verify the fastq integrity after download?

I ran cURL in verbose mode to troubleshoot, but everything seems OK. See output below (format might be screwed up).

(base) user@directory % curl -l -v ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/082/SRR11296682/SRR11296682.fastq.gz -o SRR11296682_GSM4408816_Control_replicate_4_Homo_sapiens_RNA-Seq.fastq.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 193.62.193.138:21...
 Connected to ftp.sra.ebi.ac.uk (193.62.193.138) port 21 (#0)
 < 220-Welcome to ftp.ebi.ac.uk
 < 220 
 > USER anonymous
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0< 230 Login successful.
 > PWD
 < 257 "/" is the current directory
  Entry path is '/'
 > CWD vol1
  ftp_perform ends with SECONDARY: 0
 < 250 Directory successfully changed.
 > CWD fastq
< 250 Directory successfully changed.
> CWD SRR112
< 250 Directory successfully changed.
> CWD 082
< 250 Directory successfully changed.
> CWD SRR11296682
< 250 Directory successfully changed.
> EPSV
 Connect data stream passively
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0< 229 Entering Extended Passive Mode (|||54753|)
  Trying 193.62.193.138:54753...
 Connecting to 193.62.193.138 (193.62.193.138) port 54753
 Connected to ftp.sra.ebi.ac.uk (193.62.193.138) port 21 (#0)
> TYPE A
< 200 Switching to ASCII mode.
> NLST
< 150 Here comes the directory listing.
 Maxdownload = -1
{ [22 bytes data]
 Remembering we are in dir "vol1/fastq/SRR112/082/SRR11296682/"
< 226 Directory send OK.
100    22    0    22    0     0     13      0 --:--:--  0:00:01 --:--:--    14
 Connection #0 to host ftp.sra.ebi.ac.uk left intact
ewels commented 1 year ago

Hi @blizard-wizard,

This is beyond the scope of sra-explorer sorry, it really is only for fetching the metadata. Anything downstream of that such as downloading and data integrity is the realm of the SRA / ENA directly.

The ENA docs mention that "most directories" have checksum files (link) so you could try that for checking md5sums.

Good luck!

Phil