Closed 1468ca0b-2a64-4fb4-8e52-ea5806644b4c closed 9 months ago
Created by: Fuuzetsu
ClusterDeath changes reverted, PR rebase, please approve if rest OK
Created by: facundominguez
Maybe undo the ClusterDeath changes and merge the rest. It is looking like a bug in EQs or replicated-log.
Created by: facundominguez
Same comment applies for all places below where timeouts were introduced to wait for messages. The test already has a global timeout in case a message takes too long to arrive.
Created by: facundominguez
No time sensitive behavior, please. It would be fine to block indefinitely here.
Created by: facundominguez
ClusterDeath changes look wrong. When the cluster dies, the cluster dies. It does not do an organized coreography to kill first a satellite and then the tracking station.
halon should guarantee at some point that the service will be restarted no matter how nodes die. Currently the test assumed that if the service was started (it says "Hello World") halond thinks it should be online and it won't change its mind if nodes die.
Wouldn't this be the least surprising behavior?
Closing as an obsolete
Created by: Fuuzetsu
Most changes were fairly straight forward: our rules has change but no-one update the tests: some message acks missing, ordering of structures changed etc.
initial-data-doesn't-error: we now use formulaic pvers and have a test for initial data using those. Instead of wasting time on trying to update the test to do something sane with simple strategy, just remove it
ClusterDeath: Remove the race of halonds being killed before angel had a chance to start recovery on the node. This allows the node to rejoin after RC restart every time.
Enabling in castor in HALON-470 after this one lands