czbiohub-sf / orpheum

Orpheum (Previously called and published under sencha) is a Python package for directly translating RNA-seq reads into coding protein sequence.
MIT License
18 stars 4 forks source link

Update to sourmash Nodegraph instead of khmer to allow for computation of minimum read length #77

Open olgabot opened 4 years ago

olgabot commented 4 years ago

With this PR: https://github.com/dib-lab/sourmash/pull/1009 - the sourmash Nodegraph now becomes more appealing to use instead of khmer's Nodegraph. This is because the n_unique_kmers attribute that is now added, allows for computation of the minimum necessary read length for a given false positive rate with this equation:

Screen Shot 2020-05-29 at 2 06 53 PM

olgabot commented 4 years ago

This is low-hanging fruit for speeding up the module, but backwards compatibility may be tricky for older bloom filters. There's probably a fairly straightforward try/except thing to use here, though.