eBay / HomeStore

Storage Engine for block and key/value stores.
Apache License 2.0
23 stars 21 forks source link

Add batching in UT to support million ios for baseline test. #510

Closed sanebay closed 3 months ago

sanebay commented 3 months ago

Add batching of read and write values in UT. Keep mapping a index to key value pairs for quick lookup during snapshot read. Add shutdown and start to simulate follower down, writes complete, follower start. Earlier we used to restart with sleep, but for 1 million io's or different values of num_io's we dont know how much to sleep.

To test ./test_raft_repl_dev --gtest_filter=RaftReplDevTest.BaselineTest --num_io=1000000 --log_mods=replication:debug --config_path ./config --dev_size_mb=2048600 --snapsho t_distance=0

xiaoxichen commented 3 months ago

I think we can have a general test case for both baseline and incremental, the pattern is similar, only difference is if X> #snapshot_to_keep * snapshot_distance ? mode = baseline : mode= incremental

while(#IO < TARGET_IOS_TO_RUN) {
  1. write some data
  2. shutdown one replica
  3. write  X reqs
  4. start up the down replica.
  [5. optionally ]keep writing during start up
  6. wait the replica sync.
}
//Lets have a huge one
erase data on one replica, maybe go through format again?  and let is resync from begining.

It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.

sanebay commented 3 months ago

It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.

Yeah this is a nice test case to add. We can add this in our SM long running. raft_repl UT is more like functionality test with smaller and larger number of inputs.

xiaoxichen commented 3 months ago

in SM it is hard to ensure we go through incremental or baseline. I think it is a good waste to generate 1M IOs and only test one recovery:)

yamingk commented 3 months ago

I think we can have a general test case for both baseline and incremental, the pattern is similar, only difference is if X> #snapshot_to_keep * snapshot_distance ? mode = baseline : mode= incremental

while(#IO < TARGET_IOS_TO_RUN) {
  1. write some data
  2. shutdown one replica
  3. write  X reqs
  4. start up the down replica.
  [5. optionally ]keep writing during start up
  6. wait the replica sync.
}
//Lets have a huge one
erase data on one replica, maybe go through format again?  and let is resync from begining.

It needs a night for long running, hope we can exercise recovery more times vs exercise a recovery with big gap.

In hour HomeStore 4.x replication long running, we definitely should have test case running recovery a few hundred times (we also do it with nublox 1.3 homestore long run) during the night time, (with I/Os running before and after recovery) and keep rebooting one replica. Our goal is to "abuse" HomeStore so badly so it won't "abuse" us in production.

With it being said, it can be done by someone else and in another PR (create an issue and track it).

codecov-commenter commented 3 months ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 67.40%. Comparing base (1a0cef8) to head (a7c2b9c). Report is 49 commits behind head on master.

Files Patch % Lines
.../lib/replication/log_store/home_raft_log_store.cpp 0.00% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #510 +/- ## =========================================== + Coverage 56.51% 67.40% +10.89% =========================================== Files 108 109 +1 Lines 10300 10419 +119 Branches 1402 1398 -4 =========================================== + Hits 5821 7023 +1202 + Misses 3894 2717 -1177 - Partials 585 679 +94 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.