Closed tamuanand closed 4 years ago
An update on the above - if you know the ascp command line for a particular record, that aspera download however seems to work
ascp -QT -l 300m -P33001 -i <path>/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR036/ERR036000/ERR036000_1.fastq.gz .
Tested ERR036000
just now and it seemed to work fine, so I guess that this was just a temporary glitch in the matrix..
Let me know if it keeps happening 🤞
Thanks @ewels - yes, it was a temporary glitch.
I was using the fromSRA
channel factory and I believe it is based off your code - https://www.nextflow.io/blog/2019/release-19.03.0-edge.html
One question and one suggestion:
question: Does SRA Explorer query NCBI or EBI to get the individual fastq runs? I believe NCBI looking at the fromSRA error messages.
suggestion: fromSRA was returning error messages like "can't do nulls on uids". Hence it would be nice to see a similar error reported on SRA Explorer when someone searches for a SRA id or anything, but then the underlying system (NCBI or EBI) had a glitch. In my case, I kept hitting submit with a ID and did not see the bottom change, so I was worried if something was wrong with my browser.
Again, just a suggestion.
Needless to say, it is a great great tool.
On a side note, I have suggested to Paolo/Evan that NF should develop a method to return ascp compatible urls when querying for SRA.
Right now I use fromSRA
and then have this ugly looking chained perl regex to ultimately get to a aspera compatible url_download followed by a pipe to bash - would like to know your thoughts/ideas on the below
echo "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR279/SRR279588/SRR279588_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR279/SRR279588/SRR279588_2.fastq.gz"
| perl -pe 's#.gz#.gz .#g' | perl -pe 's#.gz .#.gz . && #'
| perl -pe 's#ftp://ftp.sra.ebi.ac.uk/vol#ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh era-fasp\x40fasp.sra.ebi.ac.uk:vol#g' > SRR279588.txt
cat SRR279588.txt | bash
What that ultimately translates to is this command on the shell
ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR279/SRR279588/SRR279588_1.fastq.gz .
&&
ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR279/SRR279588/SRR279588_2.fastq.gz .
Hence, it would be nice to have a new channel factory or a method to get aspera compatible urls with NF.
Yeah, I know I should catch errors. Kind of mentioned in https://github.com/ewels/sra-explorer/issues/7#issuecomment-493731019 and it's been in the back of my mind for a while. It's a bit crap to just silently die when it hits unexpected errors.
This tool needs quite a lot of work at the moment though, as the SRA is totally replacing their infrastructure so all of the SRA links are stopping working. Unfortunately it's a fairly low priority project for me so it'll probably take me a while until I can find time to invest here.
Does SRA Explorer query NCBI or EBI to get the individual fastq runs?
It queries NCBI first to find the runs and get SRA accessions. Once it has these for individual runs, it queries the EBI to get the FastQ download paths.
The ascp
nextflow factory sounds like a sensible idea.. It might complicate things as it requires custom software though, whereas the simple URLs presumably work by default with Nextflow's built-in staging mechanisms (but this is a topic for the nextflow repo, not here 😉 )
Phil
Hi @ewels
https://sra-explorer.info/# is not returning any results (09-May-2020, 822 PM London time)
Probably the EBI/ENA ftp site is down