[Question] How are POS, LEN and SEQ of insertions determined

waltergallegog commented 9 months ago

Dear developer, I would like to understand how nanomonsv currently determines the Position, length and sequence of an insertion, specially when there is a lot of variance in the reads. For example in the insertion in the image, there are 29 reads, with insertion length ranging from 166 to 343. (nanomonsv supporting reads are actually 25, so I guess the "outliers" reads are not considered)

The insertion is called by nanomonsv with length 297.

Does nanomonsv chooses one of the reads as representative, or is the inserted sequence the result of some consensus between all reads.

Thank you very much for your support.

friend1ws commented 9 months ago

Nanomonsv gathers the supporting reads with insertions and creates consensus reads by using multiple alignment or Racon, and then performs realignment to the reference genome to determine the position and the inserted base.

As you pointed out, during the gathering of the supporting reads, Nanomonsv tries to remove some 'outliers'.

waltergallegog commented 9 months ago

ok got it, thanks for the info.

friend1ws / nanomonsv

[Question] How are POS, LEN and SEQ of insertions determined #53