blab / cartography

Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2
https://doi.org/10.1101/2024.02.07.579374
MIT License
4 stars 1 forks source link

Run HA/NA analysis with separate distance matrices for HA and NA #121

Closed huddlej closed 1 month ago

huddlej commented 1 month ago

Now that pathogen-embed supports multiple input distance matrix files and alignments, we should update the HA/NA analysis workflow to build separate distance matrices for HA and NA instead of concatenating the gene sequences into a single file. This updated analysis should produce more accurate embeddings for the distance-based methods, since the pathogen-distances command will be able to strip trailing gaps from HA and leading gaps from NA that were included in the concatenated sequences.

huddlej commented 1 month ago

See https://github.com/blab/cartography/pull/122 for details on why this is closed.