cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.81k forks source link

server: enable collecting CPU profiles at a lower rate to limit the CPU overhead of the profiling #75801

Open knz opened 2 years ago

knz commented 2 years ago

We'd like to dump periodic CPU profiles on every node. Or at least when CPU usage increases with spikes. That is, we'd like to reuse a similar logic at the one we already use to collect heap dumps and goroutine dumps (#75799).

Unfortunately, the pprof default profile rate (100Hz) is causing a noticeable (1-2%) performance dip. Given that we usually need profiles when CPU is overloaded, the additional cost due to profiling is unwelcome.

So we'd like to explore a way to collect profiles at a lower sampling rate, to lower the overhead.

Sadly, the code in pprof.StartCPUProfile() which we currently use, hardcodes the rate at 100Hz.

We haven't yet found another way to do this short of forking pprof.

Jira issue: CRDB-12842

knz commented 2 years ago

We're going to add this as one more item that can motivate a custom go runtime extension.

tbg commented 2 years ago

related: https://github.com/cockroachdb/cockroach/issues/75799

knz commented 2 years ago

@felixge do you happen to have ideas of APIs we can use short of forking pprof?

felixge commented 2 years ago

@knz hey, sorry for the late reply, I'm digging my way out of a huge email backlog right now 🙈.

You should be able to call runtime.SetCPUProfileRate(10) for reducing the sampling rate by 10x. Calling pprof.StartCPUProfile() afterwards will print a warning, but the warning is wrong and can be ignored. The requested sample rate should still work.

See https://github.com/golang/go/issues/42502 for more details and upcoming changes to this API