cbg-ethz / ConsensusFixer

Computes a consensus sequence with wobbles, ambiguous bases, and in-frame insertions, from a NGS read alignment.
GNU General Public License v3.0
18 stars 3 forks source link

deletions/uncovered regions are not present/reported #6

Closed tolot27 closed 6 years ago

tolot27 commented 6 years ago

I have a low coverage sequence set and mapped it to a reference. Some genomic regions have a coverage of 0 and I would expect Ns in the consensus sequence, but they are not present, even if I adapt -pluralityN.

tolot27 commented 6 years ago

The alignment statistics file (statistics.txt produced with the undocumented parameter --stats) lists all positions in the reference, which are covered. The not listed/uncovered positions should produce Ns in the consensus sequence.

armintoepfer commented 6 years ago

Feel free to issue a PR. Nobody is working actively on the code base.

armintoepfer commented 6 years ago

If there is no coverage, nothing should be reported. Feel free to add a mode that adds Ns for missing bases.

tolot27 commented 6 years ago

The rationale behind this issue is that a concatenated consensus sequence does not contain information of missing bases. Hence, a user might believe there is a deletion, which is indeed not present.

nahmadsen commented 6 years ago

Hi I have created a PR on this issue. I hope it can be usefull