The expression level of duplicated genes or genes with repeat regions may be over-estimated when using BWA. If a read aligns to multiple regions, BWA records the reads as mapping to both via a reduced "map-score". However, HTseq doesn't take this map-score into account when counting the reads, so the number of mapped reads gets over-estimated (reads which map to multiple places get counted multiple times).
@celawson87 ran into this problem with this anammox genomes. To get around this, we used BBMap which allows you to randomly assign a read to a single site. This way, each read only gets counted a single time by HTseq.
The expression level of duplicated genes or genes with repeat regions may be over-estimated when using BWA. If a read aligns to multiple regions, BWA records the reads as mapping to both via a reduced "map-score". However, HTseq doesn't take this map-score into account when counting the reads, so the number of mapped reads gets over-estimated (reads which map to multiple places get counted multiple times).
@celawson87 ran into this problem with this anammox genomes. To get around this, we used BBMap which allows you to randomly assign a read to a single site. This way, each read only gets counted a single time by HTseq.