cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

gc: benchmark effects of clear range on write heavy workloads #98157

Closed aliher1911 closed 1 year ago

aliher1911 commented 1 year ago

In https://github.com/cockroachdb/cockroach/pull/90830 GC got the ability to use clear range requests when removing multiple subsequent keys.

Those range requests required locks which is a step back from latchless GC that was introduced in 22.1. While underlying assumptions is that we won't have conflicts because we won't be using range removes when data is heavily updated we need to ensure that is the case.

To verify run kv0 workload with row-level TTL or similar with and without clear range feature enabled and check the performance.

Jira issue: CRDB-25108

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/replication

aliher1911 commented 1 year ago

Experiment setup

To measure how write is affected by clear range we need to run experiment where ranges that receive writes contain enough consecutive data which exceeds kv.gc.clear_range_min_keys threshold. Only the garbage sequences of same key versions or different keys longer than that would be considered for clear range operations.

We also need to have a write throughput that would not overload cluster by itself.

To measure how write throughput is affected we are going to use KV0 workload. We use 3 node cluster of n1-standard-4 machines (4VCPU/16GB) and run workload form node 4. Experiment steps:

  1. load cluster with initial rows
  2. delete all rows
  3. start workload with 3000 rows/sec
  4. after 10 minuites of workload running and qps ramping up ok, set gc.ttlseconds to 600
  5. stop workload after 1h
  6. wait for GC queue to reach zero

Size of initial import was chosen as --insert-count 500000000 so that subsequent workload didn't break sequences fast enough without giving GC a chance to collect the data. This didn't work in practice as GC required hours to cleanup. So final experiments used shorter cycle of --cycle-length 180000 during workload run to keep keys sufficiently spaced to allow clear range to still be issued.

Experiment results

QPS comparison

Clear range enabled:

stmts-clear-range

Clear range disabled:

stmts-no-clear-range

There's very little difference in number of writes per second, but it is noticeable that with clear range write throughput falls under 2900 just once while without clear range it keeps dropping below mulitple times. So write thoughput is slightly higher with clear range.

SQL latency

Clear range enabled:

sql-latency-cr

Clear range disabled:

sql-latency-no-cr

At the same time, 99.99 and 99.9 percentiles are worse when clear range is enabled. Write latency is much more spiky. Spikes reaching 7.5s in both configurations.

CPU Usage

Clear range enabled:

cpu-clear-range

Clear range disabled:

cou-no-clear-range

If we look on overload, CPU goes up from 75% when only workload is running to 95% range when GC is constantly running in parallel with workload. There's no significant difference as nodes are close to capacity when running so much GC. There's no significant difference in read/write bytes and IOPS and network traffic.

GC Total Time

Clear range enabled:

gc-time-clear-range

Clear range disabled:

gc-time-no-clear-range

One noticeable difference is amount of time taken by GC queue to process all replicas. In case of clear range it took 127 minutes with clear range and 171 minuites without. That gives us 35% improvement in overall GC time while using 94% vs 92% CPU.

Other experiments

One of unsuccessful experiments was to run workload without specifying cycle length. That lead to new rows "poisoning" range space within first 15 minutes sufficiently so that no more clear range requests could because there were no more long garbage sequences. Within this 15 min time GC started failing clear range requests more and more until it stopped issuing them completely. There were no degradation of service observed beside GC reruns on remaining data in replicas once again.

Another unsuccessful experiment was to lower kv.gc.clear_range_min_keys to 150 to allow experiment to run with less garbage. The outcome was sufficiently bad to affect writes severely. Because GC only issues a single range per request, but can send multiple point deletes batch sizes in actual number of keys deleted increased significantly which lead to increased CPU usage and node overload on extra requests.

erikgrinaker commented 1 year ago

99.99 and 99.9 percentiles are worse when clear range is enabled. Write latency is much more spiky.

Do you have a sense of why? Is it because the cycle length means that writes stall on clear range latches? Is it because the range tombstones affect write performance? And why do we see these spikes with clearrange disabled, considering we're using latchless GC?

In any case, I'm not sure this workload is very representative, i.e. a cycle whose keys are continually rewritten but only after the GC TTL has passed. Maybe if one considered a periodic ingest/truncate of a table, but in that case one would likely use TRUNCATE TABLE which would ingest into a different table ID.

Another unsuccessful experiment was to lower kv.gc.clear_range_min_keys to 150 to allow experiment to run with less garbage. The outcome was sufficiently bad to affect writes severely. Because GC only issues a single range per request, but can send multiple point deletes batch sizes in actual number of keys deleted increased significantly which lead to increased CPU usage and node overload on extra requests.

Do you have a sense of the threshold at which clear ranges become beneficial? Is 2000 a good default value?

aliher1911 commented 1 year ago

Do you have a sense of the threshold at which clear ranges become beneficial? Is 2000 a good default value?

It looks like 2000 is reasonable as it doesn't increase number or requests. It feels like when clear range is used we don't get dramatic performance increase, but we can have severe degradation if it is set low. It will mostly help if you have 1000s of versions or delete chunks of keys.

Do you have a sense of why? Is it because the cycle length means that writes stall on clear range latches?

I don't think there's any latch contention going on. The test where there's an interference between writes and clear range request very quickly breaks key sequences so that GC stops issuing them. I think system gets less responsive because of additional CPU usage because collections are happening faster.

It is hard to come up with some workload that would verify if we have key interference as if we write infrequently we won't hit same ranges as GC at the same time, and if we write frequently we break up ranges fast. And fixtures doesn't allow you to import history, they will choke on overlapping keys.

erikgrinaker commented 1 year ago

Makes sense. I think we can close this out, thanks!