ewels / sra-explorer

Web application to explore the Sequence Read Archive.
https://sra-explorer.info/
GNU General Public License v2.0
203 stars 29 forks source link

Get FastQ download links from the ENA #3

Closed ewels closed 5 years ago

ewels commented 5 years ago

Looks like the ENA has an API that can be used with SRA accession numbers to get FTP download URLs for FastQ files directly:

https://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=SRP043510&result=read_run&fields=fastq_ftp

fastq_ftp
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/004/SRR1448774/SRR1448774.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/005/SRR1448775/SRR1448775.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/006/SRR1448776/SRR1448776.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/007/SRR1448777/SRR1448777.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/008/SRR1448778/SRR1448778.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/009/SRR1448779/SRR1448779.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/000/SRR1448780/SRR1448780.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/001/SRR1448781/SRR1448781.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/002/SRR1448782/SRR1448782.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/003/SRR1448783/SRR1448783.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/004/SRR1448784/SRR1448784.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/005/SRR1448785/SRR1448785.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/006/SRR1448786/SRR1448786.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/007/SRR1448787/SRR1448787.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/008/SRR1448788/SRR1448788.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/009/SRR1448789/SRR1448789.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/000/SRR1448790/SRR1448790_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/000/SRR1448790/SRR1448790_2.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/001/SRR1448791/SRR1448791_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/001/SRR1448791/SRR1448791_2.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/002/SRR1448792/SRR1448792.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/003/SRR1448793/SRR1448793.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/004/SRR1448794/SRR1448794.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/005/SRR1448795/SRR1448795.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/002/SRR1910482/SRR1910482.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/SRR191/003/SRR1910483/SRR1910483.fastq.gz

https://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=SRR1448774&result=read_run&fields=fastq_ftp

fastq_ftp
ftp.sra.ebi.ac.uk/vol1/fastq/SRR144/004/SRR1448774/SRR1448774.fastq.gz

Should be possible to query this for each SRR accession to get download links. May be a little slow is all, so needs some consideration about how to build it in.

Phil

ewels commented 5 years ago

Done in ca67271

flashton2003 commented 5 years ago

Hi Phil,

I just stumbled across this while trying to solve a different problem. Can you tell me how to get the result which is displayed in the web broswer via the command line?

I have a few hundred accessions (in different bioprojects), and can't figure out how to do this on command line with wget or curl.

Thanks,

Phil

flashton2003 commented 5 years ago

Oh, actually, my colleague helped me sort it.

I just needed quotes around the URL being given to wget and then quotes around the weirdly named output file when I cat it.