cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.08k stars 3.8k forks source link

admission: integrate snapshot ingest with CPU limiter #123682

Open aadityasondhi opened 5 months ago

aadityasondhi commented 5 months ago

In an internal test cluster, when we had unbounded snapshot ingests, we discovered that Store.HandleSnapshot function showed high CPU usage and eventually led to an increased goroutine scheduler latency that ultimately led to spikes in SQL latency.

In an internal thread, we discussed that the ideal solution for this would be to use the elastic CPU limiter for such work since it was impacting the scheduler latency. Since this work is not technically elastic work, we would need to tweak the CPU limiter to also handle regular traffic and support higher than 1ms thresholds for pacing.

CPU profile attached. cpuprof.2024-04-30T18_09_32.630.102.pprof.zip Some metric from when the overload happened can be found here.

Jira issue: CRDB-38467

Epic CRDB-42958

sumeerbhola commented 3 months ago

See attached screenshot for another example of high CPU consumption when writing snapshot bytes. large-15

Details in https://cockroachlabs.slack.com/archives/C01SRKWGHG8/p1721291600405649