Open alexrd opened 12 months ago
This is related to #37 , but not entirely.
Can you post a specific snippet just so we can separate the different pieces that might be slow. I suspect there is multiple bottlenecks, one being #37.
The other problem which this issue addresses, is that the weights (or any field really) are appended to the datastructures as the simulation proceeds and we don't really know how much to pre-allocate ahead of time (usually). Under the hood every dataset/array in HDF5 is chunked into smaller arrays and storage gets allocated a chunk at a time even if you don't use the whole chunk. So not knowing how big the field array could get we just extend the arrays one cycle's worth of data at a time. For positions frames having chunk size correspond to a single frame isn't so bad, but for small arrays or scalars it can be a problem, like the weights, which is what leads to the observed slowness.
Having a single weights group runs/0/weights
won't actually help much as it has the same problem really. You would get some extra chunking by having a full cycle's worth of weights chunked together. However I would guess the benefit of this is still small (as num_walkers is typically a lot smaller than the number of cycles) and as you mentioned the computed observables are fast even if they are scalar. This breaks down if you have variable number of walkers, for which better support is already implemented and lingering in #86 waiting to be pulled out itself for a new version to preserve backwards compatibility.
Solutions:
First we add support for specifying the chunk sizes for each field when creating a new HDF5 file. Then with that we can have the following strategies (not mutually exclusive):
h5tools
for doing this and you can rechunk the weights to be good, but it is quite a slow process starting with a poorly chunked dataset. Its much better for optimizing one. This can also trim unused chunk space that was allocated but not used.I was thinking in general that we need a CLI tool for merging HDF5s, extracting data, listing info, etc. and this kind of tool should fit in nicely to that.
This is pretty high on priorities for me as well and the pre-allocation solution probably wouldn't be too hard to implement.
For weighted ensemble analyses it is common to require access to all of the weights at once. Currently, this takes tens of minutes to read the weights from reasonably-sized HDF5 files. In contrast, reading a newly computed observable can be done in seconds.
Could this potentially be helped by arranging them all in their own folder? E.g. ['runs/0/weights'] Or perhaps there is another way of ensuring that the weights are written to some contiguous region of the disk?