CmiPushPE is used in various contexts to add a message to a given processors queue. CmiPushPE calls CmiMyRank() to determine if the calling PE is the same as the destination PE; if so, then the message can be enqueued without necessary synchronization via CmiSendSelf().
In some cases, such as the HAPI callback API, CmiPushPE is called from outside the Charm++ runtime (eg. from a CUDA thread). In this context, the expected behavior of CmiMyRank() is unclear (as far as I can tell, rank is uninitialized and this call always returns zero), which could result in CmiSendSelf() being called from a separate processor potentially causing data race/synchronization issues.
We propose the addition of CmiPushPEExtern, which enqueues the given message under no assumptions about the caller's location (does not use CmiSendSelf(), instead assumes the caller is on a different PE). Specifically, this would be useful for external message injection, such as with the HAPI callback/CUDA thread interaction, or a similar callback-based approach in CkIO to interoperate with I/O designated pthreads.
Additionally, CmiPushPEExtern should handle the CMK_SMP_MULTIQ case differently, because the CmiState object referred to here cannot be correctly initialized, and the referenced field myGroupIdx will not be the expected value. I'm not sure what the best workaround here is though, as I don't understand the CMK_SMP_MULTIQ usage.
CmiPushPE is used in various contexts to add a message to a given processors queue.
CmiPushPE
callsCmiMyRank()
to determine if the calling PE is the same as the destination PE; if so, then the message can be enqueued without necessary synchronization viaCmiSendSelf()
.In some cases, such as the HAPI callback API,
CmiPushPE
is called from outside the Charm++ runtime (eg. from a CUDA thread). In this context, the expected behavior ofCmiMyRank()
is unclear (as far as I can tell, rank is uninitialized and this call always returns zero), which could result in CmiSendSelf() being called from a separate processor potentially causing data race/synchronization issues.We propose the addition of
CmiPushPEExtern
, which enqueues the given message under no assumptions about the caller's location (does not useCmiSendSelf()
, instead assumes the caller is on a different PE). Specifically, this would be useful for external message injection, such as with the HAPI callback/CUDA thread interaction, or a similar callback-based approach in CkIO to interoperate with I/O designated pthreads.Additionally, CmiPushPEExtern should handle the
CMK_SMP_MULTIQ
case differently, because the CmiState object referred to here cannot be correctly initialized, and the referenced fieldmyGroupIdx
will not be the expected value. I'm not sure what the best workaround here is though, as I don't understand theCMK_SMP_MULTIQ
usage.