I'm noticing that sometimes the concatenate_inferences method is the slowest part of the computation, even slower than MCMC sampling.
It looks like this can be speed up with dask -- the trick is to rechunk your az.InferenceData object, and I believe dask will do the rest (so no need to implement here I think). It does become very problematic when concatenating az.InferenceData objects with tens of thousands of features; possibly because the dask scheduler gets overwhelmed and all operations become single-threaded. I've raised this issue on the xarray discussions
In which case, the workaround is to turn concatenate_inferences into a reduction operation (i.e. merge only 1000 datasets at a time, and then merge those together).
Mainly raising this as an issue because this method is going to be problematic for larger datasets, and a reduce version of this function maybe necessarily.
I'm noticing that sometimes the concatenate_inferences method is the slowest part of the computation, even slower than MCMC sampling.
It looks like this can be speed up with dask -- the trick is to rechunk your
az.InferenceData
object, and I believe dask will do the rest (so no need to implement here I think). It does become very problematic when concatenatingaz.InferenceData
objects with tens of thousands of features; possibly because the dask scheduler gets overwhelmed and all operations become single-threaded. I've raised this issue on the xarray discussionsIn which case, the workaround is to turn
concatenate_inferences
into a reduction operation (i.e. merge only 1000 datasets at a time, and then merge those together).Mainly raising this as an issue because this method is going to be problematic for larger datasets, and a reduce version of this function maybe necessarily.