biocore / BIRDMAn

Bayesian Inferential Regression for Differential Microbiome Analysis
BSD 3-Clause "New" or "Revised" License
22 stars 5 forks source link

concatenate_inferences is slow #57

Open mortonjt opened 3 years ago

mortonjt commented 3 years ago

I'm noticing that sometimes the concatenate_inferences method is the slowest part of the computation, even slower than MCMC sampling.

It looks like this can be speed up with dask -- the trick is to rechunk your az.InferenceData object, and I believe dask will do the rest (so no need to implement here I think). It does become very problematic when concatenating az.InferenceData objects with tens of thousands of features; possibly because the dask scheduler gets overwhelmed and all operations become single-threaded. I've raised this issue on the xarray discussions

In which case, the workaround is to turn concatenate_inferences into a reduction operation (i.e. merge only 1000 datasets at a time, and then merge those together).

Mainly raising this as an issue because this method is going to be problematic for larger datasets, and a reduce version of this function maybe necessarily.