NBISweden / IgDiscover-legacy

Analyze antibody repertoires and discover new V genes from high-throughput sequencing reads
https://www.igdiscover.se
MIT License
17 stars 10 forks source link

Increase required V coverage #95

Closed marcelm closed 10 months ago

marcelm commented 5 years ago

The preprocessing filter (igdiscover filter) currently keeps assignments with 90% V coverage or more. @mateuszatki increased this setting and could avoid some artifacts. The problem is that too short V matches could be seen as the wrong gene/allele when counting exact occurrences. For counting as an exact occurrence, it is sufficient if the covered part of the V is identical to the novel V, so any differences in the non-covered part are ignored.

I have run a small test to see how many rows remain in dataset ERR1760498 at various filter settings. This is the result:

Percentage Rows remaining
90 513505
94 513464
96 513412
97 513256
98 507612
99 468679
100 183050

So going to 97% is not a problem at all in this dataset and even 98% is fine.

A separate issue should be to consider force-extending all V alignments up to the last 3' nucleotide of the reference sequence.