Bishop-Laboratory / RLPipes

RLPipes: A standardized R-loop-mapping pipeline.
https://anaconda.org/bioconda/rlpipes
MIT License
3 stars 1 forks source link

Provide direct download option for getting public datasets #80

Closed millerh1 closed 2 years ago

millerh1 commented 2 years ago

SRA toolkit is not well-maintained, and this leads to consistent issues such as

Instead of using prefetch and fastq-dump we could directly download SRA files via the ENA FTP. Here is an example of how this is done:

https://github.com/ewels/sra-explorer/blob/7807ca95a07f3d935a335ec7fe913cd713e5ecbc/index.html#L666

In this example, the accession is provided to the ENA API to retrieve the download links. This approach may be slightly slower, however. Another approach would be to use aspera.

millerh1 commented 2 years ago

Another approach (possibly better) would be to use the AWS S3 download method from SRA -- this gives the .sra object

For example, this currently works:

aws s3 cp s3://sra-pub-run-odp/sra/SRR15569159/SRR15569159 . --no-sign-request
fastq-dump SRR15569159 --split-3

And it seems the pattern is always the same so this might be a stable solution for the mid-term

millerh1 commented 2 years ago

Implemented (I don't remember which commit)