galsci / pysm

PySM 3: Sky emission simulations for Cosmic Microwave Background experiments
https://pysm3.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
36 stars 23 forks source link

Scale PySM on parallel run #39

Open zonca opened 4 years ago

zonca commented 4 years ago

MSS-001 production run

zonca commented 4 years ago

IMG_20200211_142011

The PySM operator distributes the channels equally across groups and then runs in each group. It uses shared memory so only 1 copy of inputs by node, there is no redundant work. In each node PySM should pick up some channels of the local TOD channels.

In the example in the image, we have 5 PySM channels per node, which are the first 5 of the 500 channels. Group 2 will have the second 5 channels.

Once PySM has done bandpass integration for all the local channels, it broadcasts full maps across the group communicator to each node for its own PySM channels.

Then the maps of those channels, either 1 at a time or in chunks (configurable by user), are broacasted across the rank communicator and put in shared memory, then rescanned locally by each process to the timelines. This is done in parallel in all the nodes of the first group, so we parallelize a factor of 10. Then if we do this broadcast for all 5 local PySM channels, another factor of 5. So the loop over 5000 detectors become a loop over the 100 groups.

In fact once group 0 is done, group 1 does the same with their 5 PySM channels, and so on once all the work is done.

zonca commented 4 years ago

@keskitalo: please review the write-up above.

keskitalo commented 4 years ago

Exactly as I remember. This will be a huge improvement.