gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
460 stars 136 forks source link

Performance regression in Jellyfish 2.3.0 #192

Open kissake opened 1 year ago

kissake commented 1 year ago

I've attached a chart showing performance across multiple jellyfish versions, genome sizes, values of -m (k-mer length) and -t (number of threads). The chart shows that in version 2.3.0 there is a significant performance regression (orders of magnitude), and that the regression is related to threading (performance improves with -t set to 1) and k-mer length (performance is particularly bad at -m value of 15) and has some relation to input size (relative performance improves with larger data within the most problematic regime)

jellyfish_performance_regression.pdf

I'm happy to provide more information, including original data, testing methodology, and performing additional tests.

Please let me know if there is any other way in which I can help.