blab / pathogen-embed

Create reduced dimension embeddings for pathogen sequences
https://pypi.org/project/pathogen-embed/
MIT License
1 stars 0 forks source link

Use inferred types for external embedding params #29

Closed huddlej closed 1 month ago

huddlej commented 1 month ago

Fixes a bug that happens when setting the t-SNE learning rate parameter with an external file. The internal logic of the embedding script tries to cast the external value file to the same type as the default value of each embedding method's arguments. For example, the default perplexity has a type of float, so the logic converts the input file's value to a float even if it appears in the file as an integer. This logic fails when setting t-SNE's learning rate parameter which has a default value of "auto" (a string) but which also accepts float values. The logic casts any valid float to a string and passes that string to the scikit-learn t-SNE class which throws a type error.

This commit fixes the bug by removing the custom logic to cast types from the external parameters file and allowing the default type inference from pandas to set the correct type. This way, if the user defines a learning rate of "auto" in the CSV input, it will be parsed as a string. If the user defines a float like 100.0, it will be parsed as a float. If the user defines any other type like 100, the script will throw an error.