Initial pull request to validate QueueProcessor. QueueProcessor is an alternative to ThreadProcessor that is entirely deterministic: users can control which events are read, and when, and can determine when to promote data from next to current. It's performance as compared to ThreadProcessor with 1 thread, 1 batch storage is on par. Thread Processor out performs when the number of batch storage is greater than 1, running about 10% faster on a single node.
However, QueueProcessor is designed to be much more compatible with scaling out to MPI read only IO, allowing fine grained control over synchronizing IO calls.
Some MPI work is already done but more is coming. Initiating this pull request now allows tracking the test suite, though tests for QueueProcessor don't exist yet. They'll come before merging.
There are some other optimizations here in batch_pydata and larcv3::BatchData. I moved the conversion from std::vector to numpy array to be callable from BatchData which allows much less python overhead (giving a 5x speedup on mac!).
There are never any reviewers available ... Sigh. Anyways, I have set the default at the time of this comment to have mpi OFF, openmp OFF, but the speedup gained by allowing batch_data to directly convert to numpy is big enough to merge this branch to develop at an intermediate state. So I'll merge.
Initial pull request to validate QueueProcessor. QueueProcessor is an alternative to ThreadProcessor that is entirely deterministic: users can control which events are read, and when, and can determine when to promote data from next to current. It's performance as compared to ThreadProcessor with 1 thread, 1 batch storage is on par. Thread Processor out performs when the number of batch storage is greater than 1, running about 10% faster on a single node.
However, QueueProcessor is designed to be much more compatible with scaling out to MPI read only IO, allowing fine grained control over synchronizing IO calls.
Some MPI work is already done but more is coming. Initiating this pull request now allows tracking the test suite, though tests for QueueProcessor don't exist yet. They'll come before merging.
There are some other optimizations here in batch_pydata and larcv3::BatchData. I moved the conversion from std::vector to numpy array to be callable from BatchData which allows much less python overhead (giving a 5x speedup on mac!).