Derecho-Project / derecho

The main code repository for the Derecho project.
BSD 3-Clause "New" or "Revised" License
182 stars 46 forks source link

Fix unit tests that relied on global stability callbacks #264

Closed etremel closed 7 months ago

etremel commented 8 months ago

I discovered while running regression tests for the Derecho 2.4 release that many of our unit tests had been broken by commit eb5b281a31277faea6dd61649d057fcb094ec941, which disabled global stability callbacks for RPC ("cooked") messages. This change made sense from an API standpoint, since the global stability callback is designed for "raw" messages; it receives a pointer to the delivered byte buffer, but it can't do anything useful with the buffer if it contains serialized RPC function arguments (unless it knows the types of objects to deserialize). However, several of our unit tests that used Replicated Objects with RPC messages were relying on the global stability callback to determine when the test had finished; they ignored the contents of the message in the callback and only looked at the message number.

I rewrote these tests to use mechanisms other than the global stability callback to determine when the last message in an experiment was delivered, and what its message/version number was. In most cases this involved using the DeserializationContext feature from the mutils-serialization library, which Derecho has supported for a while but has mostly gone unused (except internally for setting the PersistentRegistry in persistent objects). An object containing experiment data (e.g. the last message number, a flag indicating when the last message was received) is created in the main thread and passed to the Derecho group constructor as a DeserializationContext, which means Derecho will then provide a pointer to that object in the DeserializationManager that gets passed to the from_bytes function when deserializing a Replicated Object. The one downside is this means all the Replicated Object classes need custom from_bytes functions (they can't use DEFAULT_SERIALIZATION_SUPPORT) so they can retrieve the DeserializationContext pointer from the DeserializationManager and use it to initialize an instance variable.

Note that at this point there is still one test in the performance_tests folder that has not been fixed: persistent_latency_test. I haven't had time to rewrite and test it yet (it's a slightly more complicated test), but I wanted to create the pull request to ensure these fixes made it in to the 2.4 release.