Ordered send does not work in some cases

songweijia commented 3 years ago

In one of the Cascade evaluation experiments, I need to issue an ordered send to ask all nodes in a shard to dump their timestamp logs. This mechanism was working properly in most cases. But in a special case where the containing subgroup is the second subgroup of the same type, the ordered send never finishes. Tracing into the execution using gdb, I found MulticastGroup::delivery_trigger() is not properly triggered in the predicated thread on the sender node. For the very first ordered send, testing shows that locally_stable_sst_messages[subgroup_num] is not empty and the first sequence read in least_undelivered_sst_seq_num is 0. However, the min_stable_num variable gets -1, unexpectedly excluding the code guarded by this condition. The message never gets delivered.

I'm not sure if this always happens for the subgroups which are not the first one of a subgroup type. It needs more investigation.

I've talked to Sagar about this since he knows the best of this part. He will investigate it later. I can find a workaround for my tests but it would be great if we fix this thoroughly.

songweijia commented 3 years ago

The test case can be found in the dairy_farm_exp branch of cascade, commit c8e97a671832d1329bdbdd12ee8e8aebdbb25c2c

KenBirman commented 3 years ago

I'm wondering if this could be an instance of #215

As discussed in that thread, it is no longer good enough to just declare the SST as volatile. We are supposed to shift to using std::atomics, with "relaxed" consistency for guarded objects in the SST, but acquire-release consistency for the guards themselves, such as the "SMC message is ready" bit, the "how many messages I have sent" guard, etc.

Specifically, this suggests that at a minimum our main predicate loop should have an acquire-release atomic at the top of the loop, like a counter that can be incremented each time we loop. This would signal to the hardware that we will be accessing data that may have changed due to I/O. It is interesting to realize that we don't need an actual lock and that any variable of the proper std::atomic type (acquire-release, specifically) will suffice. That would create a memory fence, causing a very brief pause while asynchronous updates get a chance to reach DRAM and to invalidate stale cache entries, and then the next iteration of the predicate loop will see fresher data if anything has recently arrived...

Derecho-Project / derecho

Ordered send does not work in some cases #219