MrCroxx / RunKV

[WIP] An experimental cloud-native distributed KV engine for OLTP workload.
MIT License
68 stars 7 forks source link

purge raft log files with low utilization rate #114

Open MrCroxx opened 2 years ago

MrCroxx commented 2 years ago

For each raft group, log entries before compact index can be safely deleted. Although log entires are continuously written in most cases, but log entries of various groups are cross written.

e.g.

file 1: | group 1, 0 - 100 | group 2, 0 - 50 | group 1, 101 - 200 | group 2, 51 - 100 |

As a result, with the system running, some parts of some log file can be purged but the other parts can not (e.g. 90% can be purged in file 1, 50% can be purged in file 2, we call the rate that can be purged of a file utilization rate).

So we can track the (approximate) utilization rate of each file, when the utilization rate of some files drops below the threshold, we need to rewrite the file (append the still needed part to the current active log file and update the memory indices of them) and remove the old log file. The rewrite operation is safe because we recorded term of the raft log and we can easily distinguish if log entries with the same index can overwrite the current one.

The rewrite procedure is asynchronous, and the follow aspects should be taken into consideration:

  1. The utilization rate can be approximate or accurate, but the statistic should not affect the foreground writes much (e.g acquiring mutex).
  2. The background rewrite throughput and frequency should be limited to prevent from affecting the foreground writes.
  3. The threshold should be reasonable - neither too high nor too low.
MrCroxx commented 2 years ago

cc @zackertypical