ewels / sra-explorer

Web application to explore the Sequence Read Archive.
https://sra-explorer.info/
GNU General Public License v2.0
203 stars 29 forks source link

SRA Explorer not returning results #19

Closed tamuanand closed 4 years ago

tamuanand commented 4 years ago

Hi @ewels

https://sra-explorer.info/# is not returning any results (09-May-2020, 822 PM London time)

Probably the EBI/ENA ftp site is down

tamuanand commented 4 years ago

An update on the above - if you know the ascp command line for a particular record, that aspera download however seems to work

ascp -QT -l 300m -P33001 -i <path>/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR036/ERR036000/ERR036000_1.fastq.gz .
ewels commented 4 years ago

Tested ERR036000 just now and it seemed to work fine, so I guess that this was just a temporary glitch in the matrix..

Let me know if it keeps happening 🤞

tamuanand commented 4 years ago

Thanks @ewels - yes, it was a temporary glitch.

I was using the fromSRA channel factory and I believe it is based off your code - https://www.nextflow.io/blog/2019/release-19.03.0-edge.html

One question and one suggestion:

Again, just a suggestion.

Needless to say, it is a great great tool.

On a side note, I have suggested to Paolo/Evan that NF should develop a method to return ascp compatible urls when querying for SRA.

Right now I use fromSRA and then have this ugly looking chained perl regex to ultimately get to a aspera compatible url_download followed by a pipe to bash - would like to know your thoughts/ideas on the below

echo "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR279/SRR279588/SRR279588_1.fastq.gz 
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR279/SRR279588/SRR279588_2.fastq.gz" 
| perl -pe 's#.gz#.gz .#g' | perl -pe 's#.gz .#.gz . &&  #' 
| perl -pe 's#ftp://ftp.sra.ebi.ac.uk/vol#ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh  era-fasp\x40fasp.sra.ebi.ac.uk:vol#g'  > SRR279588.txt

cat SRR279588.txt | bash

What that ultimately translates to is this command on the shell

ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh  era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR279/SRR279588/SRR279588_1.fastq.gz . 
&&
ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh  era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR279/SRR279588/SRR279588_2.fastq.gz . 

Hence, it would be nice to have a new channel factory or a method to get aspera compatible urls with NF.

ewels commented 4 years ago

Yeah, I know I should catch errors. Kind of mentioned in https://github.com/ewels/sra-explorer/issues/7#issuecomment-493731019 and it's been in the back of my mind for a while. It's a bit crap to just silently die when it hits unexpected errors.

This tool needs quite a lot of work at the moment though, as the SRA is totally replacing their infrastructure so all of the SRA links are stopping working. Unfortunately it's a fairly low priority project for me so it'll probably take me a while until I can find time to invest here.

Does SRA Explorer query NCBI or EBI to get the individual fastq runs?

It queries NCBI first to find the runs and get SRA accessions. Once it has these for individual runs, it queries the EBI to get the FastQ download paths.

The ascp nextflow factory sounds like a sensible idea.. It might complicate things as it requires custom software though, whereas the simple URLs presumably work by default with Nextflow's built-in staging mechanisms (but this is a topic for the nextflow repo, not here 😉 )

Phil