Historical CPU profiles

Is your feature request related to a problem? Please describe.

There is some notion of historical heap profiles and goroutine dumps in the debug zip, gathered at certain times (I believe when there is an uptick in goroutines or memory consumption over a certain proportional threshold). Whereas with CPU profiles, the only way we get them are manually via the DB Console endpoint, or manually at the moment in time that a debug zip is taken.

Since CPU spikes are often short, we would like to have a way to have them automatically gathered if there is a spike. The other option is to monitor the CPU percentage actively until we see a spike begin to occur, and then quickly navigate to the correct place in the DB console or run the curl. Since CPU spike windows can be so short, this is not a guaranteed success either.

Currently, we lack observability into CPU consumption for short spikes, whereas sustained CPU increases are more easily observable.

Describe the solution you'd like

Something to collect CPU profiles automatically and save them if we see a spike in CPU usage, maybe even have this threshold be configurable by a cluster setting. Similar to how it works for goroutine dumps and heapprofs.

Describe alternatives you've considered

Some sort of custom-made script that will watch the CPU consumption per node, and "wake up" to get a token and curl the correct endpoint for the correct node(s) if we see an increase in CPU above a certain proportional threshold (maybe compared to a rolling CPU average for each individual node). The concern is that this wouldn't act fast enough to get decent CPU profiles for short enough spikes, either.

@thtruo

Jira issue: CRDB-21194

Epic CRDB-20791

cockroachdb / cockroach

Historical CPU profiles #91299