blab / pathogen-embed

Create reduced dimension embeddings for pathogen sequences
https://pypi.org/project/pathogen-embed/
MIT License
1 stars 0 forks source link

Add alternate encodings for PCA inputs #23

Closed huddlej closed 2 months ago

huddlej commented 2 months ago

Adds alternate encodings of sequences for PCA inputs including:

Names the original encoding as "integer", since that encoding maps each ACGT to an integer (1, 2, 3, 4) and any other characters to 5. The default encoding remains "integer", so this change is backward compatible. However, initial tests suggest that the simplex encoding may be a better default.

Closes #22