charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
206 stars 50 forks source link

CmiPushPE doesn't account for message injection from outside the Charm++ runtime #3831

Open mayantaylor opened 3 months ago

mayantaylor commented 3 months ago

CmiPushPE is used in various contexts to add a message to a given processors queue. CmiPushPE calls CmiMyRank() to determine if the calling PE is the same as the destination PE; if so, then the message can be enqueued without necessary synchronization via CmiSendSelf().

In some cases, such as the HAPI callback API, CmiPushPE is called from outside the Charm++ runtime (eg. from a CUDA thread). In this context, the expected behavior of CmiMyRank() is unclear (as far as I can tell, rank is uninitialized and this call always returns zero), which could result in CmiSendSelf() being called from a separate processor potentially causing data race/synchronization issues.

We propose the addition of CmiPushPEExtern, which enqueues the given message under no assumptions about the caller's location (does not use CmiSendSelf(), instead assumes the caller is on a different PE). Specifically, this would be useful for external message injection, such as with the HAPI callback/CUDA thread interaction, or a similar callback-based approach in CkIO to interoperate with I/O designated pthreads.

Additionally, CmiPushPEExtern should handle the CMK_SMP_MULTIQ case differently, because the CmiState object referred to here cannot be correctly initialized, and the referenced field myGroupIdx will not be the expected value. I'm not sure what the best workaround here is though, as I don't understand the CMK_SMP_MULTIQ usage.

mayantaylor commented 3 months ago

CMK_SMP_MULTIQ was introduced in commit c2a9701 and the CmiPushPE specific usage was introduced in #1420.