Closed sanebay closed 3 months ago
I think we can have a general test case for both baseline and incremental, the pattern is similar, only difference is if
X> #snapshot_to_keep * snapshot_distance ? mode = baseline : mode= incremental
while(#IO < TARGET_IOS_TO_RUN) {
1. write some data
2. shutdown one replica
3. write X reqs
4. start up the down replica.
[5. optionally ]keep writing during start up
6. wait the replica sync.
}
//Lets have a huge one
erase data on one replica, maybe go through format again? and let is resync from begining.
It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.
It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.
Yeah this is a nice test case to add. We can add this in our SM long running. raft_repl UT is more like functionality test with smaller and larger number of inputs.
in SM it is hard to ensure we go through incremental or baseline. I think it is a good waste to generate 1M IOs and only test one recovery:)
I think we can have a general test case for both baseline and incremental, the pattern is similar, only difference is if
X> #snapshot_to_keep * snapshot_distance ? mode = baseline : mode= incremental
while(#IO < TARGET_IOS_TO_RUN) { 1. write some data 2. shutdown one replica 3. write X reqs 4. start up the down replica. [5. optionally ]keep writing during start up 6. wait the replica sync. } //Lets have a huge one erase data on one replica, maybe go through format again? and let is resync from begining.
It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.
In hour HomeStore 4.x replication long running, we definitely should have test case running recovery a few hundred times (we also do it with nublox 1.3 homestore long run) during the night time, (with I/Os running before and after recovery) and keep rebooting one replica. Our goal is to "abuse" HomeStore so badly so it won't "abuse" us in production.
With it being said, it can be done by someone else and in another PR (create an issue and track it).
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Attention: Patch coverage is 0%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 67.40%. Comparing base (
1a0cef8
) to head (a7c2b9c
). Report is 49 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
.../lib/replication/log_store/home_raft_log_store.cpp | 0.00% | 1 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Add batching of read and write values in UT. Keep mapping a index to key value pairs for quick lookup during snapshot read. Add shutdown and start to simulate follower down, writes complete, follower start. Earlier we used to restart with sleep, but for 1 million io's or different values of num_io's we dont know how much to sleep.
To test ./test_raft_repl_dev --gtest_filter=RaftReplDevTest.BaselineTest --num_io=1000000 --log_mods=replication:debug --config_path ./config --dev_size_mb=2048600 --snapsho t_distance=0