By visually inspecting the alignments (using samtools tview, there are some regions which seems to have bad alignments: the reason seems to be that Markduplicates changes the sequence in aligned file. For example, before markduplicates we have:
Then number of matches is identical, however markduplicates add 150 mismatches, and the sequence changed in column 10 is the sequence visualized using samtools tview. This behaviour does not affect all the genome regions. Is not clear how this affects the calling process. Markduplicates should be removed as described in #71
By visually inspecting the alignments (using
samtools tview
, there are some regions which seems to have bad alignments: the reason seems to be thatMarkduplicates
changes the sequence in aligned file. For example, beforemarkduplicates
we have:And after
markduplicates
we have:Then number of matches is identical, however
markduplicates
add 150 mismatches, and the sequence changed in column 10 is the sequence visualized usingsamtools tview
. This behaviour does not affect all the genome regions. Is not clear how this affects the calling process.Markduplicates
should be removed as described in #71