Distributed io allranks

marcodeltutto commented 5 years ago

This branch includes changes that allow running ThreadProcessor on several MPI processes reading different entries in the file. The start entry can now be specified explicitly, and also how many entries have to be skipped between two consecutive readings.

The distributed_larcv_interface now allows the option "read_from_all_ranks" in the constructor. If this is set to true, an instance of larcv_threadio (and so of ThreadProcessor) will be present in all ranks. The start entry of every rank is set to (rank minibatch_size), and the number of entries to skip between to readings is set to (size minibatch_size), where rank is the rank number, and size is the number of ranks.

marcodeltutto commented 5 years ago

The distributed_larcv_interface now has 3 options to read the data:

Can read from only one rank and distribute to all the other ranks;
Can read from one local rank, and then distribute to all other ranks in the same node;
Can read from all ranks, and no distribution is needed.

This option can be specified in the distributed_larcv_interface constructor:

"read_from_single_rank"
"read_from_single_local_rank"
"read_from_all_ranks"

coreyjadams commented 5 years ago

I have updated this branch with the threadio tests and hdf5-fix branch, if all tests pass I will merge.

coreyjadams commented 5 years ago

All tests have passed. I am going to merge into develop.

It should be noted that there are not tests available yet for the distributed IO interface. I will open an issue to work on this.

DeepLearnPhysics / larcv3

Distributed io allranks #7