etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.97k stars 9.79k forks source link

Make fragment in client watch request default #18965

Open serathius opened 1 day ago

serathius commented 1 day ago

What would you like to be added?

Fragment is a client controller Watch feature introduced in etcd v3.4. Proto

If configured by client, it allows server to chunk Watch response by maximum size. When response is fragmented instead of sending one big response, etcd will send subsets of events with responses marked as fragmented. Only for the last response it will unmark fragmented, letting client know that this response was finished.

From user perspective enabling fragmenting doesn't cause any changes, all the fragmenting is handled within client code, which will combine fragmented responses into single one. It's a protocol optimization, not user visible.

Marking responses as fragmented seems to be always good, the only exception would be if there are multiple events sharing revision from single response. When fragmenting is critical to maintain atomic watch property.

As part of this issue I would like to recommend K8s to use it.

Why is this needed?

Make etcd more reliable during watch conjestion.

In tests done by running go run ./tools/benchmark/main.go watch-latency --watch-per-stream 1000 --streams 1 --put-total 200 --val-size 100000, we notice around 20% reduction in memory used and almost 2x improvement in latency while watch becomes congested.

object size [KB] Events/s Fragment etcd memory[MB] 50%ile 90%ile 99%ile
5 100646.0428 FALSE 111.72 0.1005 0.1543 0.1856
5 100887.1057 TRUE 110.44 0.1005 0.1547 0.1866
10 78963.1236 FALSE 1103.924 2.7177 7.3661 11.9127
10 87156.2031 TRUE 783.276 1.9748 5.2995 9.2281
100 6576.9363 FALSE 5138.712 14.0618 26.2721 29.1691
100 10619.2212 TRUE 4144.832 8.5963 15.969 17.8447
serathius commented 1 day ago

cc @ahrtr @dead2k @jpbetz @wojtek-t