blab / cartography

Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2
https://doi.org/10.1101/2024.02.07.579374
MIT License
4 stars 1 forks source link

Quantify ability of SARS-CoV-2 embeddings to capture recombinant lineages relative to parental lineages #62

Closed huddlej closed 11 months ago

huddlej commented 12 months ago

A clear gap in the current structure of the paper is some quantification of how well SARS-CoV-2 embeddings capture relationships between recombinant lineages and their parental lineages. We quantify the ability of flu embeddings to capture reassortment, so we should do the same for SARS-CoV-2.

Recombination is more complicated than reassortment, since the recombinant genomes are not distinct combinations of existing lineages but some continuous distribution of the parental lineages at different breakpoints.

One approach could be to quantify the pattern we observe by visually inspecting embeddings of two parental lineages and their recombinant offspring. In practice, we look for the recombinant samples to place "between" the two parental lineages in an embedding. We could quantify this placement by calculating the average distance between samples in the parental lineages A and B from each other and the average distances between samples from the recombinant lineage X and either parental lineage A or B. We expect lineages A and B to map farther apart from each other on average than A and X and B and X.

huddlej commented 11 months ago

Resolved by 1639a47c790bddbcb0a1990a008a16b19ac9ba9e and 03c47e68fde3286fedacbf1fd400502078636f45.