Closed c-mita closed 5 years ago
It is known that the HDF5 library is not very efficient in multi-threaded contexts - read operations in particular are much faster in multi-process contexts.
Perhaps we could look at spawning a second process and transfer the data across shared memory? Sound unpleasant.
HDF5 1.10.2 adds a H5DOreach_chunk
function to the high level interface - the counter-part to the H5DOwrite_chunk
function used by the odin file writer. It skips all pipeline operations (leaving it up to us to decompress) but may perform much faster.
Some early investigation using H5DOread_chunk
yields mixed results.
Skipping a significant amount of the work done by the HDF5 library allows more of the work to be done in parallel (particularly decompresion), but complicates the code and requires addition of the decompression libraries. The calls to read the data chunks are still have locks, so the degree of speed-up is highly dependent on the performance of the file system, but what I've seen so far suggests a 50-100% slow-down at worst when compared to neggia's approach, with only minor differences when reading data off an SSD (delta of a few seconds when reading 12000 frames - 0-20%).
I'll start trying to implement this strategy in the plugin to be used so long as all conditions for it are met (bitshuffle_lz4 being the only filter in the pipeline, one frame == one chunk, etc).
This branch https://github.com/DiamondLightSource/durin/tree/chunk_read contains the work required to use H5DOread_chunk
.
There is some refactoring work to do to cleanup the code and remove unnecessary work.
The chunk_read branch was merged a while ago, and has significantly improved performance in the multi-threaded case, given a reasonably performant file-system. In poor cases (poor mounts to networked filesystems) it isn't as good, but there's not too much that can be done there.
Single threaded execution using Dectris' neggia plugin
Single threaded execution using durin
That's over double the time required...
Allowing parallel execution using OpenMP in the host process (four threads).
neggia
durin
This is a poor showing...