bebop / poly

A Go package for engineering organisms.
https://pkg.go.dev/github.com/bebop/poly
MIT License
663 stars 70 forks source link

created a super minimal mash function for sketching sequences. #344

Closed TimothyStiles closed 10 months ago

TimothyStiles commented 11 months ago

This package is meant to help create "sketches" via the mash sketching algorithm. There's plenty of room for optimization here but I've implemented the core idea.

Mash: fast genome and metagenome distance estimation using MinHash.

Ondov, B.D., Treangen, T.J., Melsted, P. et al. Genome Biol 17, 132 (2016). https://doi.org/10.1186/s13059-016-0997-x

Mash Screen: high-throughput sequence containment estimation for genome discovery.

Ondov, B., Starrett, G., Sappington, A. et al. Genome Biol 20, 232 (2019). https://doi.org/10.1186/s13059-019-1841-x

Koeng101 commented 11 months ago

7zdkzz

TimothyStiles commented 11 months ago

adding the help wanted tag because @Koeng101 wants this merged but the following still needs to be done and I won't be able to get around to it for at least a week.

  1. Keep hashes sorted for faster search
  2. Figure out where this belongs on the package level e.g seqhash or maybe a new search package?
  3. Better comments and docs in general
  4. 100% test coverage and more robust test case.
carreter commented 11 months ago

Linking this to #356

Koeng101 commented 10 months ago

I really like @soypat changes in this case. They make a lot of sense to me