etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.95k stars 9.78k forks source link

Run a separate in memory snapshot to reduce number of entries stored in raft memory storage #18825

Open serathius opened 3 weeks ago

serathius commented 3 weeks ago

Part of https://github.com/etcd-io/etcd/issues/17098

Alternative to https://github.com/etcd-io/etcd/pull/18635

Goal: Reduce number of raft entries stored in memory

Context:

Proposal:

Benchmark results ./bin/tools/benchmark put --total=15000 --val-size=100000

Scenario Snapshot count Memory snapshot count CPU[%] Memory[MB] PUTs per second
Base 10000 n/a 49.4 1954.5 1029.2
Snapshot in memory 10000 1 51.1 1330.4 1006.1
Snapshot in memory 10000 10 48.9 1264.2 1046.3
Snapshot in memory 10000 100 48.1 1210.4 1051.8
Snapshot in memory 10000 1000 48.2 1360.0 1047.2
Snapshot in memory 10000 10000 49.4 2040.3 1027.4
k8s-ci-robot commented 3 weeks ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot commented 3 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/etcd-io/etcd/blob/main/OWNERS)~~ [serathius] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
codecov-commenter commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 87.75510% with 6 lines in your changes missing coverage. Please review.

Project coverage is 68.74%. Comparing base (8f91eb1) to head (e4d3691).

Files with missing lines Patch % Lines
server/etcdserver/server.go 87.50% 4 Missing and 2 partials :warning:
Additional details and impacted files | [Files with missing lines](https://app.codecov.io/gh/etcd-io/etcd/pull/18825?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io) | Coverage Δ | | |---|---|---| | [server/etcdserver/api/membership/cluster.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18825?src=pr&el=tree&filepath=server%2Fetcdserver%2Fapi%2Fmembership%2Fcluster.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io#diff-c2VydmVyL2V0Y2RzZXJ2ZXIvYXBpL21lbWJlcnNoaXAvY2x1c3Rlci5nbw==) | `88.53% <100.00%> (ø)` | | | [server/etcdserver/server.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18825?src=pr&el=tree&filepath=server%2Fetcdserver%2Fserver.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io#diff-c2VydmVyL2V0Y2RzZXJ2ZXIvc2VydmVyLmdv) | `81.76% <87.50%> (-0.48%)` | :arrow_down: | ... and [24 files with indirect coverage changes](https://app.codecov.io/gh/etcd-io/etcd/pull/18825/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io) ```diff @@ Coverage Diff @@ ## main #18825 +/- ## ========================================== - Coverage 68.80% 68.74% -0.07% ========================================== Files 420 420 Lines 35575 35584 +9 ========================================== - Hits 24479 24463 -16 - Misses 9670 9691 +21 - Partials 1426 1430 +4 ``` ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/etcd-io/etcd/pull/18825?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/etcd-io/etcd/pull/18825?dropdown=coverage&src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io). Last update [8f91eb1...e4d3691](https://app.codecov.io/gh/etcd-io/etcd/pull/18825?dropdown=coverage&src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=etcd-io).
serathius commented 1 day ago

ping @ahrtr

ahrtr commented 1 day ago

Overall looks good to me.

We don't keep up to 100K (--snapshot-count) + 5K (catch-up-entries) raft log entries any more, instead, we only keep at most 100 (--compact-raft-log-interval-indices) + 5K.

serathius commented 2 hours ago

/retest