McMahonLab / geodes

Diel transcriptomics of freshwater lakes
2 stars 1 forks source link

Reconsider BWA for read mapping #15

Closed joshamilton closed 7 years ago

joshamilton commented 7 years ago

The expression level of duplicated genes or genes with repeat regions may be over-estimated when using BWA. If a read aligns to multiple regions, BWA records the reads as mapping to both via a reduced "map-score". However, HTseq doesn't take this map-score into account when counting the reads, so the number of mapped reads gets over-estimated (reads which map to multiple places get counted multiple times).

@celawson87 ran into this problem with this anammox genomes. To get around this, we used BBMap which allows you to randomly assign a read to a single site. This way, each read only gets counted a single time by HTseq.

alexlinz commented 7 years ago

Thanks Josh! @sstevens2 agrees, too. I'm going to switch from BWA to bbmap for the next run.