FASTQ position usage by Salmon

COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment

GNU General Public License v3.0

780 stars 165 forks source link

This is not exactly a bug, but a comment and a question regarding how Salmon uses the positioning data in fastq files. We had a series a RNASeq samples where the majority of the reads were listed at 0:0 in the fastq file. We think this is some obscure issue with one of the trimming/demultiplexing pipelines. No one noticed, as this data is not generally used, but it did throw an error with rsem. Luckily, this error had been previously reported.

Notably, Salmon using quasi mapping was fine. It was only when I tried again using STAR aligned bam files that I noticed that only those reads not listed at 0:0 were used by Salmon (STAR does not seem to care one way or the other). Obviously, badly formated fastq files do not constitute a bug and we are working on fixing them, but we were curious why the positioning data was being used in alignment mode but not quasi mode. Moreover, why is it being used at all? Is it used to weed out potential artifacts?

Many thanks and happy to share an example file if your are interested.

COMBINE-lab / salmon

FASTQ position usage by Salmon #101