blab / pathogen-embed

Create reduced dimension embeddings for pathogen sequences
https://pypi.org/project/pathogen-embed/
MIT License
1 stars 0 forks source link

Add a quickstart guide #9

Closed huddlej closed 4 months ago

huddlej commented 7 months ago

Although we currently provide API documentation for the available commands in this package, we don't provide a quickstart tutorial that describes how to run these commands together.

We should make a brief quickstart guide in the README.md file that shows how to download some example flu data (maybe from the cartography project?), run pathogen-distance on the alignment, run pathogen-embed on the distances and alignment, visualize the embedding figure, inspect the embedding output (head mds.csv), inspect the soon-to-come pairwise distance plot to pick a distance threshold for clustering, run pathogen-cluster, and inspect the cluster results (head cluster_mds.csv).

It would be even more helpful to show how to merge the cluster/embedding table with metadata from a Nextstrain analysis and visualize the results in Nextstrain, but I think that's for a separate future issue.