Hoverbear / old-raft-rs

[Incomplete] A Raft implementation in Rust
https://hoverbear.github.io/raft-rs/raft/
MIT License
266 stars 41 forks source link

Add comprehensive tests, especially for fail recovery #12

Open foodhype opened 9 years ago

foodhype commented 9 years ago

First, I suggest breaking up the current basic_test into many tests.

Second, I suggest having a test that (1) completely kills the process on which a node is running; (2) restarts a completely new process (with no knowledge from other processes) to replace the dead process; (3) blocks issuing any new commands until some stabilization period has passed; and (4) asserts that the new node is able to get back up to speed with the old state on its own. (Leader recovery and follower recovery are separate cases, obviously.)

Running the nodes on separate processes will guarantee that no state is shared between nodes except through asynchronous message passing. Killing the process completely and abruptly will allow testing for edge cases that occur during real machine failures, such as reusing old sockets, cleaning up resources, recovering state from scratch (for the recovering process), and failure detection/handling (by neighbor processes).

Common methods of testing include having a master spawn processes, issue a sequence of commands, stabilize, and then kill the process completely. Other methods include having a time bomb mechanism whereby the master server commands processes to immediately crash themselves after performing a certain number of commands, which allows more fine-grained scenario testing.

Hoverbear commented 9 years ago

Absolutely! basic_test is bigger then it needs to be. Thanks for these suggestions!