admission: integrate TTL scans with AC read bandwidth limiter

cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.

https://www.cockroachlabs.com

Other

29.59k stars 3.71k forks source link

admission: integrate TTL scans with AC read bandwidth limiter #121782

Open aadityasondhi opened 3 months ago

aadityasondhi commented 3 months ago

We have seen full table scans that occur as part of row-level TTL cause overload by saturating the read bandwidth on a node.

Note: as of https://github.com/cockroachdb/cockroach/pull/124184, the default ttl_delete_rate_limit is being set to 100 (used to be unlimited). For resolving this issue, we may want to set that cluster setting back to unlimited in the tests that cover this issue.

Jira issue: CRDB-37530

Epic CRDB-37479

rafiss commented 3 months ago

Just to clarify, TTL does not do a full table scan, per se. Each leaseholder node of the table scans only the ranges it holds the lease for. There are two levels of concurrency:

Work is spread across leaseholders, which all do their scans at the same time.
Within each leaseholder, there are GOMAXPROCS goroutines working concurrently.

This parallelism is probably one of the reasons why read bandwidth can get saturated.

rafiss commented 3 months ago

Also, each of the goroutines mentioned above will delete the keys from the range immediately after it scans the range. Does the TTL code also need to integrate with the disk write traffic limiter?