blab / pathogen-embed

Create reduced dimension embeddings for pathogen sequences
https://pypi.org/project/pathogen-embed/
MIT License
1 stars 0 forks source link

Replace internal sorting with check for order #32

Closed huddlej closed 1 month ago

huddlej commented 1 month ago

Replace internal sort of alignments and distance matrices with a check for consistent record order across inputs. Fixes bugs caused by unexpected mismatch between inputs to and outputs from pathogen-embed. For example, pathogen-distance produces a matrix output with sequence names in the same order as the input, but pathogen-embed would produce an embedding ordered alphabetically by sequence name. This commit opts for an alternate approach of checking for the same order in multiple alignment or distance matrix inputs and throwing an error when mismatches are found.

Closes #28