biocore / qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
GNU General Public License v2.0
286 stars 265 forks source link

parallel beta diversity is likely slower than single core beta diversity #1087

Closed wasade closed 11 years ago

wasade commented 11 years ago

List lookups are the devil. This is an awkward one to work through. Essentially, parallel beta diversity gets a phylogenetic row metric, that is generally unifrac. This row metric does a lookup on the sample ids that come in on the call. These sample ids are then passed to a reorder method. The reorder method then performs a list lookup, followed by a second list lookup on success.

It took a bit to tease this out. For a list of sample IDs greater than a handful, fixing this could result in substantial improvements to parallel beta diversity. This is an interesting beast though as the number of sample IDs decreases with the number of parallel processes used, so it is actually a bit difficult to predict if this would be a large benefit. Yet another motivation to profile

wasade commented 11 years ago

This does appear to effect single threaded beta diversity as well here, here and here and descending code. This is contingent on the sample order not being the desired order, so it is possible these branches are not normally exercised. If they are, they will be slow.

wasade commented 11 years ago

...and on a quick test with the tutorial dataset, the code in question is being executed. Will submit a pull request shortly.