ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
254 stars 33 forks source link

`fastq-dump` ignore technical #133

Closed ababaian closed 4 years ago

ababaian commented 4 years ago

See: This page on fastq-dump

Problem: Some libraries have technical reads such as bar-code index or other non-biological information.

Solution: Add the --skip-technical option to fastq-dump such that these reads can be excluded.

This chicken library is an example of a ilbrary with technical reads.

taltman commented 4 years ago

Note that in the case above, using --skip-technical will generate two files if splitting reads into files: <id>_3.fastq <id>_4.fastq If your pipeline is assuming that the files will be called <id>_1.fastq and <id>_2.fastq, you will need to check for this, and rename the files.

ababaian commented 4 years ago

The --split-e only deals in biological reads to begin with. So inadvertently avoided this mistake. Looks like that spatial transcriptome library was just a particular case where the reads were mis-annotated, SRR11616465.

     --split-e                     3-way splitting for mate-pairs. For each
                                     spot, if there are two biological reads
                                     satisfying filter conditions, the first
                                     is placed in the `*_1.fastq` file, and
                                     the second is placed in the `*_2.fastq`
                                     file. If there is only one biological
                                     read satisfying the filter conditions,
                                     it is placed in the `*.fastq` file.All
                                     other reads in the spot are ignored.