galaxyproject / tools-iuc

Tool Shed repositories maintained by the Intergalactic Utilities Commission
https://galaxyproject.org/iuc
MIT License
161 stars 417 forks source link

fasterq_dump misses the clip option #6171

Open paulzierep opened 1 month ago

paulzierep commented 1 month ago

We discovered, that fastq files downloaded from NCBI SRA via fasterq_dump are different to the ENA stored fastq files. After some digging, this is probably due to the --clip option.

Example downloaded from: https://www.ebi.ac.uk/ena/browser/view/DRR010705

@DRR010705.1 HUMWT9A01AC2YA/4
ATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCATCTTGCGCTCCTTGGTATTCCTTGGAGCATGCCTGTTTGAGTATCATGAGCAAATCTCAAAGTCAATTCCTTAATTGGTTTTGCTTTGGACTTGGAGGTCTTGCAGATTTCACAGTCTGCTCCTCTTAAATGCATTAGCTGGATCTCAGTAATTATGCTTGGTTCCACTCGGCGTGATAAGTATCACTCGCTGAGGACACTGTTAAAAAGGTGGCCAGGAAATTACTGATTGAACCGCTTCTAACGGTCTATTAAGTTGGACAATTGACCCCTTAAGTTTGATCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACCAACAGGGATTGCCTTAGTAACGGCGGGTGAAGCGGCAACAGCTCAAATTTGAAATCTGGCTCTTTCAGGGTCCGAGTTGTAATTTGTAGAAGT
+
EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHFIIIIIIIIIIIIIIHBBBHDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHCCEECCBBIIIIIIIIADDIIICCEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDCHIIIIIIIIIIIIIIIIIIIIIIHDAADDIIAAA;;AAIAADDACIIAAAA@IIICCAECCICAAACAAAAIBBBBA>>>??@????AA899999;@;;????A?87777<A=:666=<<<;444;<AB996=;AA<<99999<?==;;;8331021..,,,..0..,,,//000.,,,,//1////1186/...1353;8<:7733357:8:777555544841111233310011464333331101440,,,,,.-444221

Default download via fasterq_dump

@HUMWT9A01AC2YA/4
ATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCATCTTGCGCTCCTTGGTATTCCTTGGAGCATGCCTGTTTGAGTATCATGAGCAAATCTCAAAGTCAATTCCTTAATTGGTTTTGCTTTGGACTTGGAGGTCTTGCAGATTTCACAGTCTGCTCCTCTTAAATGCATTAGCTGGATCTCAGTAATTATGCTTGGTTCCACTCGGCGTGATAAGTATCACTCGCTGAGGACACTGTTAAAAAGGTGGCCAGGAAATTACTGATTGAACCGCTTCTAACGGTCTATTAAGTTGGACAATTGACCCCTTAAGTTTGATCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACCAACAGGGATTGCCTTAGTAACGGCGGGTGAAGCGGCAACAGCTCAAATTTGAAATCTGGCTCTTTCAGGGTCCGAGTTGTAATTTGTAGAAGTAG
+
EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHFIIIIIIIIIIIIIIHBBBHDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHCCEECCBBIIIIIIIIADDIIICCEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDCHIIIIIIIIIIIIIIIIIIIIIIHDAADDIIAAA;;AAIAADDACIIAAAA@IIICCAECCICAAACAAAAIBBBBA>>>??@????AA899999;@;;????A?87777<A=:666=<<<;444;<AB996=;AA<<99999<?==;;;8331021..,,,..0..,,,//000.,,,,//1////1186/...1353;8<:7733357:8:777555544841111233310011464333331101440,,,,,.-44422100
pavanvidem commented 1 month ago

It is in fastq_dump. Maybe copy it over?

paulzierep commented 1 month ago

bummer: fasterq-dump does not have the clip option