etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.77k stars 9.77k forks source link

Include and stabilize `experimental-compaction-sleep-interval` flag in releases #18481

Closed JalinWang closed 2 weeks ago

JalinWang commented 2 months ago

What would you like to be added?

Two parameters govern the auto compaction process: experimental-compaction-batch-limit and experimental-compaction-sleep-interval. Despite being added three years ago in this PR commit, the sleep interval flag has yet to be included in any releases. Meanwhile, the batch limit flag is under stabilization consideration in issue, and I propose stabilizing the experimental-compaction-sleep-interval as well.

Why is this needed?

Compaction significantly affects service response time. Distributing pressure more evenly is desired, where these two params serve. While workarounds exist currently, retention window size has limit flexibility and it's better to utilize the built-in mechanism over additional independent maintenance scripts.

image image

ivanvc commented 2 months ago

Discussed during the fortnightly triage meeting. I'll review the PR.

JalinWang commented 2 months ago

Discussed during the fortnightly triage meeting. I'll review the PR.

Thanks for the update! I'm looking forward to your feedback~

Also, the following PR for bbolt can greatly improve etcd performance in our scenario where free space is considerable (dbSize - dbSizeInUse) in some time. If possible, could you also mention the release for 1.4.0? The alpha0 was released in January and alpha1 in May, so it seems the next version could be expected in September. That would be a great step toward a stable 1.4.0. (Although we'll still need to wait for etcd 3.6 😫 )

## v1.4.0-alpha.0(2024-01-12) change log
- [Record the count of free page to improve the performance of hashmapFreeCount]
([https://github.com/etcd-io/bbolt/pull/585 ](https://github.com/etcd-io/bbolt/pull/585)).

Attachment: our pprof result screenshot ( dbSize ~11GB, dbSizeInUse ~6GB) image

ivanvc commented 2 months ago

@JalinWang, can you help with the CHANGELOG pull request to mention #18514?

Regarding the bbolt change, I'd suggest opening an issue on its repository.

Thanks!

JalinWang commented 1 month ago

@JalinWang, can you help with the CHANGELOG pull request to mention #18514?

Sorry for the late PR. Plz review: https://github.com/etcd-io/etcd/pull/18556 :)

Regarding the bbolt change, I'd suggest opening an issue on its repository.

okkkkk~

elias-dbx commented 3 weeks ago

Hello, is there any guidance on how to tweak --experimental-compaction-batch-limit and --experimental-compaction-sleep-interval for large clusters?

We have ~40GB etcd databases which create around 2000 new revisions per second. We run compaction once every 30 minutes but see availability drops due to pauses during compaction time.

JalinWang commented 3 weeks ago

Hello, is there any guidance on how to tweak --experimental-compaction-batch-limit and --experimental-compaction-sleep-interval for large clusters?

Hi~ Personally, I adjusted --experimental-compaction-sleep-interval to a higher value and decreased --experimental-compaction-batch-limit to distribute the compaction load evenly across the whole auto compaction interval (typcial 1h) . This should minimize the spikes of RT during compaction tasks.

I found an article online link (in Chinese, use google translater maybe) about optimizing etcd for large clusters(~10k nodes), which mentioned the "compaction-sleep-interval" param. However, it doesn't provide any specific guidance on tuning these two parameters. If you come across any other resources, please share with me :)

elias-dbx commented 2 weeks ago

Once we upgrade to 3.5.16 I will try tweaking the compaction sleep interval and report back. We run up to 15k nodes in our k8s clusters.

ivanvc commented 2 weeks ago

I'll close this issue as the backport is complete and is already part of the 3.5.16 release. Please reopen if you feel there's more work to do.

Thanks, @JalinWang, for your contribution.