Closed aliher1911 closed 1 year ago
cc @cockroachdb/replication
To measure how write is affected by clear range we need to run experiment where ranges that receive writes contain enough consecutive data which exceeds kv.gc.clear_range_min_keys
threshold. Only the garbage sequences of same key versions or different keys longer than that would be considered for clear range operations.
We also need to have a write throughput that would not overload cluster by itself.
To measure how write throughput is affected we are going to use KV0 workload. We use 3 node cluster of n1-standard-4 machines (4VCPU/16GB) and run workload form node 4. Experiment steps:
Size of initial import was chosen as --insert-count 500000000
so that subsequent workload didn't break sequences fast enough without giving GC a chance to collect the data. This didn't work in practice as GC required hours to cleanup. So final experiments used shorter cycle of --cycle-length 180000
during workload run to keep keys sufficiently spaced to allow clear range to still be issued.
Clear range enabled:
Clear range disabled:
There's very little difference in number of writes per second, but it is noticeable that with clear range write throughput falls under 2900 just once while without clear range it keeps dropping below mulitple times. So write thoughput is slightly higher with clear range.
Clear range enabled:
Clear range disabled:
At the same time, 99.99 and 99.9 percentiles are worse when clear range is enabled. Write latency is much more spiky. Spikes reaching 7.5s in both configurations.
Clear range enabled:
Clear range disabled:
If we look on overload, CPU goes up from 75% when only workload is running to 95% range when GC is constantly running in parallel with workload. There's no significant difference as nodes are close to capacity when running so much GC. There's no significant difference in read/write bytes and IOPS and network traffic.
Clear range enabled:
Clear range disabled:
One noticeable difference is amount of time taken by GC queue to process all replicas. In case of clear range it took 127 minutes with clear range and 171 minuites without. That gives us 35% improvement in overall GC time while using 94% vs 92% CPU.
One of unsuccessful experiments was to run workload without specifying cycle length. That lead to new rows "poisoning" range space within first 15 minutes sufficiently so that no more clear range requests could because there were no more long garbage sequences. Within this 15 min time GC started failing clear range requests more and more until it stopped issuing them completely. There were no degradation of service observed beside GC reruns on remaining data in replicas once again.
Another unsuccessful experiment was to lower kv.gc.clear_range_min_keys
to 150 to allow experiment to run with less garbage. The outcome was sufficiently bad to affect writes severely. Because GC only issues a single range per request, but can send multiple point deletes batch sizes in actual number of keys deleted increased significantly which lead to increased CPU usage and node overload on extra requests.
99.99 and 99.9 percentiles are worse when clear range is enabled. Write latency is much more spiky.
Do you have a sense of why? Is it because the cycle length means that writes stall on clear range latches? Is it because the range tombstones affect write performance? And why do we see these spikes with clearrange disabled, considering we're using latchless GC?
In any case, I'm not sure this workload is very representative, i.e. a cycle whose keys are continually rewritten but only after the GC TTL has passed. Maybe if one considered a periodic ingest/truncate of a table, but in that case one would likely use TRUNCATE TABLE
which would ingest into a different table ID.
Another unsuccessful experiment was to lower
kv.gc.clear_range_min_keys
to 150 to allow experiment to run with less garbage. The outcome was sufficiently bad to affect writes severely. Because GC only issues a single range per request, but can send multiple point deletes batch sizes in actual number of keys deleted increased significantly which lead to increased CPU usage and node overload on extra requests.
Do you have a sense of the threshold at which clear ranges become beneficial? Is 2000 a good default value?
Do you have a sense of the threshold at which clear ranges become beneficial? Is 2000 a good default value?
It looks like 2000 is reasonable as it doesn't increase number or requests. It feels like when clear range is used we don't get dramatic performance increase, but we can have severe degradation if it is set low. It will mostly help if you have 1000s of versions or delete chunks of keys.
Do you have a sense of why? Is it because the cycle length means that writes stall on clear range latches?
I don't think there's any latch contention going on. The test where there's an interference between writes and clear range request very quickly breaks key sequences so that GC stops issuing them. I think system gets less responsive because of additional CPU usage because collections are happening faster.
It is hard to come up with some workload that would verify if we have key interference as if we write infrequently we won't hit same ranges as GC at the same time, and if we write frequently we break up ranges fast. And fixtures doesn't allow you to import history, they will choke on overlapping keys.
Makes sense. I think we can close this out, thanks!
In https://github.com/cockroachdb/cockroach/pull/90830 GC got the ability to use clear range requests when removing multiple subsequent keys.
Those range requests required locks which is a step back from latchless GC that was introduced in 22.1. While underlying assumptions is that we won't have conflicts because we won't be using range removes when data is heavily updated we need to ensure that is the case.
To verify run kv0 workload with row-level TTL or similar with and without clear range feature enabled and check the performance.
Jira issue: CRDB-25108