IKIM-Essen / uncovar

Transparent and robust SARS-CoV-2 variant calling and lineage assignment with comprehensive reporting.
https://ikim-essen.github.io/uncovar
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

VOC similarity table #470

Closed thomasbtf closed 2 years ago

thomasbtf commented 2 years ago

Version

Bug Report

In the new VoC similarity table for the Illumina sample, the VoC with the highest similarity is 20A S 126A. According to CoV Lineages, this is Kongo (B.1.620).

21K Omicron is rated with 54% similarity for the Nanopore sample.

However, that is surprising because these test samples come from simulated reads of a B.1.1.7 genome (at Illumina) or were downloaded from EMBL-EBI a long time ago (~October) (at Nanopore).

The Jaccard indices are currently only defined in the numerator via the lineage-defining variants. This creates a bias. One could also include in the numerator the variants that explicitly do not belong to a lineage. For such variants, one must of course work with a different weight, namely 1 - PROB_PRESENT.

Minimal Example

See https://github.com/IKIM-Essen/uncovar/suites/5170544594/artifacts/156821233.

Logs

N/A

Additional context

image

thomasbtf commented 2 years ago

Solved