charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
207 stars 50 forks source link

Node-level message aggregation for CkMulticast #1394

Open juanjgalvez opened 7 years ago

juanjgalvez commented 7 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/1394


Because CkMulticastMgr is a group, it uses a tree structure of PEs to send group messages. The problem is that if one of the PEs in the tree is busy with something, it won't process multicast messages that could be processed by other PEs in the same node.

Solution is to convert CkMulticastMgr to a nodegroup. Trees should be of logical nodes (processes) instead. Ideally, the spanning tree algorithm will also be physical-node aware when topology information is present.

stwhite91 commented 5 years ago

Original date: 2017-02-16 21:44:51


Core decided that a Node Group should be added on top of the current Group CkMulticastMgr

PhilMiller commented 5 years ago

Original date: 2017-05-09 20:50:04


This won't be an API change, AFAICT, so it could be done in a patch release.

stwhite91 commented 5 years ago

Original date: 2017-08-30 19:47:27


Any update on this?

juanjgalvez commented 5 years ago

Original date: 2017-10-11 20:23:18


Currently debugging this on Blue Waters.

juanjgalvez commented 5 years ago

Original date: 2017-10-11 20:39:48


This is crashing on BW with 64 nodes.

The dependency chain for building CkArray group is locMgr->mcastMgr->array. Apparently the crash is due to nodegroup dependencies not existing (are ignored). So, because mcastMgr is in the middle of dependency chain the end result is that there is NO dependency being enforced for creation.

juanjgalvez commented 5 years ago

Original date: 2017-10-30 14:44:20


Respecting the dependencies during creation seems to solve problems. Performance still needs to be tuned.

But nodegroup dependencies support does not exist yet in main charm branch, and merging a good solution will probably take some time.