brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
139 stars 53 forks source link

mismatch filter #30

Open jklughammer opened 7 years ago

jklughammer commented 7 years ago

I added the functionality of a mismatch filter. There is now a new parameter (mismatch_ratio) which specifies the maximum acceptable ratio of mismatches to alignment length. Reads which have to many mismatches are reported as qc-failed (0x200) and unmapped (0x4). Chromosome and mapping position are set to * and 0 respectively and the originally reported values are stored in two extra fileds MC:Z: (chromosome) and MP:Z: (position). I implemented this, because I need more control over which reads are considered "mapped". Might be useful for others, too.

brentp commented 7 years ago

can you make this opt-in, so that the default is 1, meaning it has no effect and only do any extra work if the value is < 1?

also, please make the argument "--mismatch-ratio" instead of "--mismatch_ratio".

jklughammer commented 7 years ago

done.

brentp commented 7 years ago

what's with the MC and MP tags?

jklughammer commented 7 years ago

stores the original mapping chromosome and position: MC:Z: (chromosome) and MP:Z: (position) just so-that it's not lost. You think it's problematic for downstream analysis?

brentp commented 7 years ago

I guess that's fine. I would prefer not to set CHROM and POS to bad values, just leave the originals and set the flag.

jklughammer commented 7 years ago

I need CHROM and POS to be set to * and 0 for what I do downstream. Also I think some tools might be confused if the flag says "unmapped", but then there's mapping positions.

brentp commented 7 years ago

I don't think I want to set POS and CHROM. Can you filter on the flag, rather than on the mapping?

jklughammer commented 7 years ago

I do filter on the flag, but I also summarize mapping positions. And I want the reads with too many mismatches to be counted as unmapped. To me it makes more sense this way. Do you have any specific concerns about changing chrom and pos? If yes I might reconsider. In any case if you prefer to keep the mapping positions just change it after merging - or don't merge if you think it's not generally useful (: