etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
46.77k stars 9.64k forks source link

Experiments with replacing index implementation of `btree with key -> revisions` with `revision list of immutable trees` #18184

Open serathius opened 1 week ago

serathius commented 1 week ago

Recently talked with Marcel from Cillium about his recent issue with etcd being overloaded with LIST with limit. Even with low limit, the list will range all the keys to count the total number of keys (not a great feature).

My idea was to improve performance of getting key count within range by using interval trees. For now I didn't managed to implement it as I have already started seeing issues finding a good implementation of immutable tree. For now I tried with AVL trees.

The results:

name                                   old time/op    new time/op    delta
IndexPutBase-12                          81.7ns ± 2%   674.5ns ± 6%   +725.35%  (p=0.000 n=10+10)
IndexPutLongKey-12                       82.4ns ± 1%   708.7ns ± 3%   +760.02%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                  309ns ± 1%    1931ns ± 1%   +525.73%  (p=0.000 n=10+8)
IndexGetBase-12                          96.6ns ± 2%    57.7ns ± 3%    -40.24%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   700ns ±10%     383ns ± 3%    -45.24%  (p=0.000 n=10+10)
IndexGetAccurate-12                      98.2ns ± 5%    57.2ns ± 3%    -41.74%  (p=0.000 n=10+10)
IndexGetMisses-12                        94.3ns ± 3%    60.0ns ± 3%    -36.39%  (p=0.000 n=10+10)
IndexGetLongKey-12                        107ns ± 1%      67ns ± 2%    -37.77%  (p=0.000 n=8+9)
IndexGetLargeKeySpace-12                  347ns ± 1%    1135ns ± 2%   +226.70%  (p=0.000 n=10+10)
IndexRangeBase-12                         405ns ± 4%     512ns ± 2%    +26.67%  (p=0.000 n=10+8)
IndexRangeHighSelectivity-12              354ns ± 1%     452ns ± 2%    +27.61%  (p=0.000 n=10+10)
IndexRangeLongKey-12                      655ns ± 4%     616ns ± 5%     -6.00%  (p=0.001 n=10+10)
IndexRangeRepeatedKeys-12                3.94µs ± 7%    0.80µs ± 6%    -79.56%  (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12                872ns ± 1%    2072ns ± 1%   +137.70%  (p=0.000 n=10+10)
IndexRevisionsBase-12                     296ns ± 3%     391ns ± 2%    +31.90%  (p=0.000 n=10+10)
IndexRevisionsHighSelectivity-12          241ns ± 1%     327ns ± 1%    +35.55%  (p=0.000 n=10+10)
IndexRevisionsLongKey-12                  496ns ± 7%     453ns ± 5%     -8.64%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12            35.4µs ±18%     1.3µs ± 5%    -96.38%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12            702ns ± 1%    1901ns ± 1%   +170.71%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                    244µs ± 3%     253µs ± 2%     +3.75%  (p=0.000 n=10+10)
IndexCountRevisionsBase-12                219ns ± 3%     323ns ± 4%    +47.27%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     168ns ± 5%     252ns ± 5%    +49.64%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             379ns ± 1%     358ns ± 6%     -5.32%  (p=0.002 n=8+10)
IndexCountRevisionsRepeatedKeys-12       3.97µs ±14%    0.46µs ± 7%    -88.41%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       601ns ± 1%    1804ns ± 3%   +199.98%  (p=0.000 n=10+10)

Pros:

Cons:

The main blocker for this approach is reducing the cost of Puts and improving the

Even if the idea doesn't stick I expect we can keep the benchmarks.

k8s-ci-robot commented 1 week ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

serathius commented 1 week ago

Tried the btree with copy:

name                                   old time/op    new time/op    delta
IndexPutBase-12                          81.7ns ± 2%  1044.5ns ±22%  +1178.19%  (p=0.000 n=10+10)
IndexPutLongKey-12                       82.4ns ± 1%  1077.8ns ±20%  +1207.90%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                  309ns ± 1%    3056ns ± 6%   +890.38%  (p=0.000 n=10+10)
IndexGetBase-12                          96.6ns ± 2%   139.7ns ± 1%    +44.65%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   700ns ±10%     822ns ± 8%    +17.47%  (p=0.000 n=10+10)
IndexGetAccurate-12                      98.2ns ± 5%   139.8ns ± 2%    +42.33%  (p=0.000 n=10+10)
IndexGetMisses-12                        94.3ns ± 3%   141.9ns ± 2%    +50.50%  (p=0.000 n=10+10)
IndexGetLongKey-12                        107ns ± 1%     157ns ± 3%    +46.44%  (p=0.000 n=8+10)
IndexGetLargeKeySpace-12                  347ns ± 1%    2053ns ± 2%   +491.18%  (p=0.000 n=10+10)
IndexRangeBase-12                         405ns ± 4%     443ns ± 3%     +9.58%  (p=0.000 n=10+10)
IndexRangeHighSelectivity-12              354ns ± 1%     340ns ± 2%     -4.02%  (p=0.000 n=10+10)
IndexRangeLongKey-12                      655ns ± 4%     725ns ± 7%    +10.62%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                3.94µs ± 7%    1.05µs ± 6%    -73.42%  (p=0.000 n=9+10)
IndexRangeLargeKeySpace-12                872ns ± 1%    2657ns ± 2%   +204.91%  (p=0.000 n=10+10)
IndexRevisionsBase-12                     296ns ± 3%     331ns ± 1%    +11.48%  (p=0.000 n=10+9)
IndexRevisionsHighSelectivity-12          241ns ± 1%     221ns ± 1%     -8.57%  (p=0.000 n=10+9)
IndexRevisionsLongKey-12                  496ns ± 7%     571ns ± 4%    +15.02%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12            35.4µs ±18%     1.2µs ± 3%    -96.63%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12            702ns ± 1%    2457ns ± 2%   +249.89%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                    244µs ± 3%     240µs ± 3%       ~     (p=0.123 n=10+10)
IndexCountRevisionsBase-12                219ns ± 3%     235ns ± 2%     +7.01%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     168ns ± 5%     126ns ± 3%    -25.26%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             379ns ± 1%     442ns ± 2%    +16.75%  (p=0.000 n=8+10)
IndexCountRevisionsRepeatedKeys-12       3.97µs ±14%    0.70µs ± 6%    -82.47%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       601ns ± 1%    2290ns ± 3%   +280.76%  (p=0.000 n=10+10)

name                                   old alloc/op   new alloc/op   delta
IndexPutBase-12                            150B ± 0%     2148B ±18%  +1327.24%  (p=0.000 n=8+10)
IndexPutLongKey-12                         151B ± 1%     2076B ±16%  +1277.64%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                   121B ± 0%     5125B ± 1%  +4135.79%  (p=0.000 n=7+10)
IndexGetBase-12                           64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                   64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetAccurate-12                       64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetMisses-12                         64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetLongKey-12                        64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexGetLargeKeySpace-12                  64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexRangeBase-12                          558B ± 6%      493B ± 6%    -11.63%  (p=0.000 n=10+10)
IndexRangeHighSelectivity-12               623B ± 0%      560B ± 0%    -10.23%  (p=0.000 n=10+10)
IndexRangeLongKey-12                       558B ± 4%      480B ± 4%    -13.88%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                  552B ± 6%      468B ± 7%    -15.21%  (p=0.000 n=10+10)
IndexRangeLargeKeySpace-12                 556B ± 0%      494B ± 2%    -11.11%  (p=0.000 n=10+10)
IndexRevisionsBase-12                      265B ± 1%      196B ± 2%    -26.15%  (p=0.000 n=9+8)
IndexRevisionsHighSelectivity-12           288B ± 0%      224B ± 0%    -22.11%  (p=0.000 n=10+8)
IndexRevisionsLongKey-12                   263B ± 4%      198B ± 7%    -24.41%  (p=0.000 n=10+10)
IndexRevisionsRepeatedKeys-12              252B ± 5%      191B ±11%    -24.37%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12             262B ± 1%      198B ± 1%    -24.57%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                     303B ± 0%      239B ± 0%    -21.12%  (p=0.002 n=8+10)
IndexCountRevisionsBase-12                64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12     64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12             64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12        64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12       64.0B ± 0%      0.0B        -100.00%  (p=0.000 n=10+10)

name                                   old allocs/op  new allocs/op  delta
IndexPutBase-12                            1.00 ± 0%      7.00 ± 0%   +600.00%  (p=0.000 n=10+10)
IndexPutLongKey-12                         1.00 ± 0%      7.00 ± 0%   +600.00%  (p=0.000 n=10+10)
IndexPutLargeKeySpace-12                   1.00 ± 0%     16.00 ± 0%  +1500.00%  (p=0.000 n=10+10)
IndexGetBase-12                            1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetRepeatedKeys-12                    1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetAccurate-12                        1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetMisses-12                          1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetLongKey-12                         1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexGetLargeKeySpace-12                   1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexRangeBase-12                          7.00 ± 0%      5.70 ±12%    -18.57%  (p=0.000 n=9+10)
IndexRangeHighSelectivity-12               7.40 ± 8%      6.50 ± 8%    -12.16%  (p=0.005 n=10+10)
IndexRangeLongKey-12                       7.00 ± 0%      5.70 ±12%    -18.57%  (p=0.000 n=10+10)
IndexRangeRepeatedKeys-12                  7.40 ± 8%      6.30 ±11%    -14.86%  (p=0.001 n=10+10)
IndexRangeLargeKeySpace-12                 5.00 ± 0%      4.00 ± 0%    -20.00%  (p=0.000 n=10+10)
IndexRevisionsBase-12                      4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=9+10)
IndexRevisionsHighSelectivity-12           4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+10)
IndexRevisionsLongKey-12                   4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+9)
IndexRevisionsRepeatedKeys-12              4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.000 n=10+10)
IndexRevisionsLargeKeySpace-12             3.00 ± 0%      2.00 ± 0%    -33.33%  (p=0.000 n=10+10)
IndexRevisionsLimit-12                     4.00 ± 0%      3.00 ± 0%    -25.00%  (p=0.002 n=8+10)
IndexCountRevisionsBase-12                 1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsHighSelectivity-12      1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLongKey-12              1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsRepeatedKeys-12         1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
IndexCountRevisionsLargeKeySpace-12        1.00 ± 0%      0.00        -100.00%  (p=0.000 n=10+10)
serathius commented 1 week ago

Looking at btree code, looks like adding fast counting should not be hard as it already includes intervals.

codecov-commenter commented 1 week ago

Codecov Report

Attention: Patch coverage is 63.06306% with 41 lines in your changes missing coverage. Please review.

Project coverage is 56.15%. Comparing base (15f9a59) to head (df9d93b). Report is 2 commits behind head on main.

:exclamation: Current head df9d93b differs from pull request most recent head 6838500

Please upload reports for the commit 6838500 to get more accurate results.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files | [Files](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [server/storage/mvcc/kvstore.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?src=pr&el=tree&filepath=server%2Fstorage%2Fmvcc%2Fkvstore.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c2VydmVyL3N0b3JhZ2UvbXZjYy9rdnN0b3JlLmdv) | `57.09% <ø> (-33.36%)` | :arrow_down: | | [server/storage/mvcc/revision.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?src=pr&el=tree&filepath=server%2Fstorage%2Fmvcc%2Frevision.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c2VydmVyL3N0b3JhZ2UvbXZjYy9yZXZpc2lvbi5nbw==) | `76.59% <80.00%> (-18.28%)` | :arrow_down: | | [etcdutl/snapshot/v3\_snapshot.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?src=pr&el=tree&filepath=etcdutl%2Fsnapshot%2Fv3_snapshot.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-ZXRjZHV0bC9zbmFwc2hvdC92M19zbmFwc2hvdC5nbw==) | `14.53% <64.00%> (-34.38%)` | :arrow_down: | | [server/storage/mvcc/index.go](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?src=pr&el=tree&filepath=server%2Fstorage%2Fmvcc%2Findex.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c2VydmVyL3N0b3JhZ2UvbXZjYy9pbmRleC5nbw==) | `54.05% <60.52%> (-37.62%)` | :arrow_down: | ... and [151 files with indirect coverage changes](https://app.codecov.io/gh/etcd-io/etcd/pull/18184/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) ```diff @@ Coverage Diff @@ ## main #18184 +/- ## =========================================== - Coverage 68.75% 56.15% -12.60% =========================================== Files 416 416 Lines 35128 35126 -2 =========================================== - Hits 24152 19726 -4426 - Misses 9569 14007 +4438 + Partials 1407 1393 -14 ``` ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?dropdown=coverage&src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Last update [15f9a59...6838500](https://app.codecov.io/gh/etcd-io/etcd/pull/18184?dropdown=coverage&src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None).