A clear gap in the current structure of the paper is some quantification of how well SARS-CoV-2 embeddings capture relationships between recombinant lineages and their parental lineages. We quantify the ability of flu embeddings to capture reassortment, so we should do the same for SARS-CoV-2.
Recombination is more complicated than reassortment, since the recombinant genomes are not distinct combinations of existing lineages but some continuous distribution of the parental lineages at different breakpoints.
One approach could be to quantify the pattern we observe by visually inspecting embeddings of two parental lineages and their recombinant offspring. In practice, we look for the recombinant samples to place "between" the two parental lineages in an embedding. We could quantify this placement by calculating the average distance between samples in the parental lineages A and B from each other and the average distances between samples from the recombinant lineage X and either parental lineage A or B. We expect lineages A and B to map farther apart from each other on average than A and X and B and X.
A clear gap in the current structure of the paper is some quantification of how well SARS-CoV-2 embeddings capture relationships between recombinant lineages and their parental lineages. We quantify the ability of flu embeddings to capture reassortment, so we should do the same for SARS-CoV-2.
Recombination is more complicated than reassortment, since the recombinant genomes are not distinct combinations of existing lineages but some continuous distribution of the parental lineages at different breakpoints.
One approach could be to quantify the pattern we observe by visually inspecting embeddings of two parental lineages and their recombinant offspring. In practice, we look for the recombinant samples to place "between" the two parental lineages in an embedding. We could quantify this placement by calculating the average distance between samples in the parental lineages A and B from each other and the average distances between samples from the recombinant lineage X and either parental lineage A or B. We expect lineages A and B to map farther apart from each other on average than A and X and B and X.