Closed m8pple closed 3 years ago
Changing it from:
void * rtcl_func(void * args)
// Thread function spin on the MPI RTC, bleating every so often.
// DO NOT post messages from this thread, because you cannot rely on anything
// being alive at the end to get them, and MPI will block.
{
RTCL::comms_t * pC = static_cast<RTCL::comms_t *>(args);
double t_ = MPI_Wtime();
double t;
for(;;) {
if (pC->l_kill) break;
if (pC->l_stop) continue;
}
to:
void * rtcl_func(void * args)
// Thread function spin on the MPI RTC, bleating every so often.
// DO NOT post messages from this thread, because you cannot rely on anything
// being alive at the end to get them, and MPI will block.
{
RTCL::comms_t * pC = static_cast<RTCL::comms_t *>(args);
double t_ = MPI_Wtime();
double t;
for(;;) {
if (pC->l_kill) break;
if (pC->l_stop) {
OSFixes::sleep(10);
continue;
}
reduced CPU massively, and had no discernible effect. Though I'm not really sure where RTCL
is used from, or what exactly it delays. I'm assuming that the accuracy with which it prints
"TICK" to stdout
is not critical, given there will be lots of jitter on the file flushing anyway.
RTCL is designed to generate high-precision and high-accuracy events. The use-case of RTCL is to provide these events to another MPI process, e.g. the Mothership(s), on varying schedules. A uniform unloaded spinner was used to avoid needing to faff with variable length sleeps.
Nothing currently uses RTCL in actual deployment (implementation is pending, but will be part of the Supervisor API).
The way forward is that we will not run RTCL for the time being, and will bring it back when functionality using has been implemented.
This is related to #236 , but the root cause is a bit different.
Once CommonBase is able to avoid calling OnIdle all the time, we get left with a thread still spinning at 100%:
It is just locked in rtcl_func, and looks like it is spinning on a single tiny loop:
My reading of x86 is not what it used to be, plus perf profiling is not that accurate, but that
test
thenjal
sequence looks like an infinite loop to me. Possibly related to #237Even if it is not an infinite loop, it is still wasting a huge amount of time and battery. I don't see why it can't sleep for 1ms or something (ideally longer, as that still probably keeps CPUs in a high power state).