cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.84k stars 3.77k forks source link

kvserver/load: add logging for top k replicas by load #98108

Closed kvoli closed 8 months ago

kvoli commented 1 year ago

There is currently have no insight into historic hot replicas information. The hot ranges API and UI page contain only current hot replicas information.

It is desirable to view historic top K hottest replicas over time for debugging issues with rebalancing. e.g. debugging the "load" of a replica after a transferring its lease and the "load" of the new leaseholder replica - or correlating replica and lease rebalancing with hotness.

This issue is to add periodic logging of the top k hottest ranges on a store.

The logging should include a minimum of:

The logging should be made to the kvDistribution log channel.

The logging should be at a frequency of 10 minutes.

The k value, for how many replicas are included should depend upon the load on the store. i.e. In periods of heavy load, k is high. In periods of no to low load, k scales to 0.

Jira issue: CRDB-25081

blathers-crl[bot] commented 1 year ago

Hi @kvoli, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

kvoli commented 1 year ago

Going to hold off starting on this issue until https://github.com/cockroachdb/cockroach/pull/89511 is merged. There is some overlap.

kvoli commented 1 year ago

I'm going to remove the GA-blcoker for this. I would prefer re-using the datastructures/code in https://github.com/cockroachdb/cockroach/pull/89511 then running a different solution. That PR also needs to be merged into master and backported - so this work seems less likely to get into 23.1 GA.

I'm going to tentatively target 23.1.1 release.

kvoli commented 9 months ago

It would be ideal to re-use or build on-top of @koorosh's great work. For future implementer see:

https://github.com/cockroachdb/cockroach/blob/eb4a9168389f341a573ac91b75e717ada0e3a04a/pkg/server/structlogging/hot_ranges_log.go#L81-L81