Closed JalinWang closed 2 weeks ago
Discussed during the fortnightly triage meeting. I'll review the PR.
Discussed during the fortnightly triage meeting. I'll review the PR.
Thanks for the update! I'm looking forward to your feedback~
Also, the following PR for bbolt can greatly improve etcd performance in our scenario where free space is considerable (dbSize - dbSizeInUse) in some time. If possible, could you also mention the release for 1.4.0? The alpha0 was released in January and alpha1 in May, so it seems the next version could be expected in September. That would be a great step toward a stable 1.4.0. (Although we'll still need to wait for etcd 3.6 😫 )
## v1.4.0-alpha.0(2024-01-12) change log
- [Record the count of free page to improve the performance of hashmapFreeCount]
([https://github.com/etcd-io/bbolt/pull/585 ](https://github.com/etcd-io/bbolt/pull/585)).
Attachment: our pprof result screenshot ( dbSize ~11GB, dbSizeInUse ~6GB)
@JalinWang, can you help with the CHANGELOG pull request to mention #18514?
Regarding the bbolt change, I'd suggest opening an issue on its repository.
Thanks!
@JalinWang, can you help with the CHANGELOG pull request to mention #18514?
Sorry for the late PR. Plz review: https://github.com/etcd-io/etcd/pull/18556 :)
Regarding the bbolt change, I'd suggest opening an issue on its repository.
okkkkk~
Hello, is there any guidance on how to tweak --experimental-compaction-batch-limit
and --experimental-compaction-sleep-interval
for large clusters?
We have ~40GB etcd databases which create around 2000 new revisions per second. We run compaction once every 30 minutes but see availability drops due to pauses during compaction time.
Hello, is there any guidance on how to tweak
--experimental-compaction-batch-limit
and--experimental-compaction-sleep-interval
for large clusters?
Hi~
Personally, I adjusted --experimental-compaction-sleep-interval
to a higher value and decreased --experimental-compaction-batch-limit
to distribute the compaction load evenly across the whole auto compaction interval (typcial 1h) . This should minimize the spikes of RT during compaction tasks.
I found an article online link (in Chinese, use google translater maybe) about optimizing etcd for large clusters(~10k nodes), which mentioned the "compaction-sleep-interval" param. However, it doesn't provide any specific guidance on tuning these two parameters. If you come across any other resources, please share with me :)
Once we upgrade to 3.5.16 I will try tweaking the compaction sleep interval and report back. We run up to 15k nodes in our k8s clusters.
I'll close this issue as the backport is complete and is already part of the 3.5.16 release. Please reopen if you feel there's more work to do.
Thanks, @JalinWang, for your contribution.
What would you like to be added?
Two parameters govern the auto compaction process:
experimental-compaction-batch-limit
andexperimental-compaction-sleep-interval
. Despite being added three years ago in this PR commit, the sleep interval flag has yet to be included in any releases. Meanwhile, the batch limit flag is under stabilization consideration in issue, and I propose stabilizing theexperimental-compaction-sleep-interval
as well.Why is this needed?
Compaction significantly affects service response time. Distributing pressure more evenly is desired, where these two params serve. While workarounds exist currently, retention window size has limit flexibility and it's better to utilize the built-in mechanism over additional independent maintenance scripts.