atarashansky / self-assembling-manifold

The Self-Assembling-Manifold (SAM) algorithm.
MIT License
41 stars 11 forks source link

Setting random seed for umap and clustering #36

Closed suhaasa closed 3 years ago

suhaasa commented 3 years ago

Hey Alec,

My lab is interested in using your algorithm (and maybe SAMap in the future). I'm trying to set a seed to get the same umap projection everytime, but setting the seed in the run function is still producing a different scatter plot everytime. I think clustering is also different, but I haven't done extensive testing there.

Thoughts?

Suhaas

atarashansky commented 3 years ago

Hey Suhaas!

SAM by default uses the cosine distance metric between cells for calculating nearest neighbors, which uses a hyper-fast kNN solver implemented by hnswlib. Unfortuantely, hnswlib is not seedable, so using the same seed in SAM may result in slightly different kNN's. All other distance metrics are implemented by PyNNDescent, which does allow seeding.

If you try using sam.run(distance='correlation', seed=0), do you get reproducible UMAPs?

Let me know! Alec

suhaasa commented 3 years ago

Thanks for the quick response! That did the trick - I did not know hnswlib was not seedable. I may stick with the cosine distance metric since I think it produces better projections visually for our data. Excited to fully applying your method!