Seagate / halon

High availability solution
Apache License 2.0
1 stars 0 forks source link

[HALON-469] Fix broken tests #1006

Closed 1468ca0b-2a64-4fb4-8e52-ea5806644b4c closed 9 months ago

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 8 years ago

Created by: Fuuzetsu

Most changes were fairly straight forward: our rules has change but no-one update the tests: some message acks missing, ordering of structures changed etc.

initial-data-doesn't-error: we now use formulaic pvers and have a test for initial data using those. Instead of wasting time on trying to update the test to do something sane with simple strategy, just remove it

ClusterDeath: Remove the race of halonds being killed before angel had a chance to start recovery on the node. This allows the node to rejoin after RC restart every time.


Enabling in castor in HALON-470 after this one lands

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 8 years ago

Created by: Fuuzetsu

ClusterDeath changes reverted, PR rebase, please approve if rest OK

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 8 years ago

Created by: facundominguez

Maybe undo the ClusterDeath changes and merge the rest. It is looking like a bug in EQs or replicated-log.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 8 years ago

Created by: facundominguez

Same comment applies for all places below where timeouts were introduced to wait for messages. The test already has a global timeout in case a message takes too long to arrive.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 8 years ago

Created by: facundominguez

No time sensitive behavior, please. It would be fine to block indefinitely here.

1468ca0b-2a64-4fb4-8e52-ea5806644b4c commented 8 years ago

Created by: facundominguez

ClusterDeath changes look wrong. When the cluster dies, the cluster dies. It does not do an organized coreography to kill first a satellite and then the tracking station.

halon should guarantee at some point that the service will be restarted no matter how nodes die. Currently the test assumed that if the service was started (it says "Hello World") halond thinks it should be online and it won't change its mind if nodes die.

Wouldn't this be the least surprising behavior?

shailesh-vaidya commented 9 months ago

Closing as an obsolete