In the new VoC similarity table for the Illumina sample, the VoC with the highest similarity is 20A S 126A. According to CoV Lineages, this is Kongo (B.1.620).
21K Omicron is rated with 54% similarity for the Nanopore sample.
However, that is surprising because these test samples come from simulated reads of a B.1.1.7 genome (at Illumina) or were downloaded from EMBL-EBI a long time ago (~October) (at Nanopore).
The Jaccard indices are currently only defined in the numerator via the lineage-defining variants. This creates a bias. One could also include in the numerator the variants that explicitly do not belong to a lineage. For such variants, one must of course work with a different weight, namely 1 - PROB_PRESENT.
Version
Bug Report
In the new VoC similarity table for the Illumina sample, the VoC with the highest similarity is 20A S 126A. According to CoV Lineages, this is Kongo (B.1.620).
21K Omicron is rated with 54% similarity for the Nanopore sample.
However, that is surprising because these test samples come from simulated reads of a B.1.1.7 genome (at Illumina) or were downloaded from EMBL-EBI a long time ago (~October) (at Nanopore).
The Jaccard indices are currently only defined in the numerator via the lineage-defining variants. This creates a bias. One could also include in the numerator the variants that explicitly do not belong to a lineage. For such variants, one must of course work with a different weight, namely 1 - PROB_PRESENT.
Minimal Example
See https://github.com/IKIM-Essen/uncovar/suites/5170544594/artifacts/156821233.
Logs
N/A
Additional context