Now that pathogen-embed supports multiple input distance matrix files and alignments, we should update the HA/NA analysis workflow to build separate distance matrices for HA and NA instead of concatenating the gene sequences into a single file. This updated analysis should produce more accurate embeddings for the distance-based methods, since the pathogen-distances command will be able to strip trailing gaps from HA and leading gaps from NA that were included in the concatenated sequences.
Now that
pathogen-embed
supports multiple input distance matrix files and alignments, we should update the HA/NA analysis workflow to build separate distance matrices for HA and NA instead of concatenating the gene sequences into a single file. This updated analysis should produce more accurate embeddings for the distance-based methods, since thepathogen-distances
command will be able to strip trailing gaps from HA and leading gaps from NA that were included in the concatenated sequences.