kv,storage: request evaluation metrics

sumeerbhola commented 3 years ago

This came up in the context of a support issue, where it was unclear what was causing the write load on a node.

We currently have distsender.rpc.*.sent metrics, but they are not on the receive side, and are counting requests and not bytes. Having request counters on the receiver may be a good start.
Having bytes written/read from storage metrics, broken down by KV request type, could be useful. We could plumb the request type into the context, which would allow the storage package to increment byte counts.

@aayushshah15 @andreimatei @tbg

gz#8437

Jira issue: CRDB-7596

andreimatei commented 3 years ago

This all sounds useful to me.

We could plumb the request type into the context, which would allow the storage package to increment byte counts.

The one premature nit I have is to do whatever we do through a structured interface, not through magic context info.

tbg commented 3 years ago

I agree that request execution metrics would be useful, i.e. metrics <method>.bytes_<read/written>. We need support from pebble for that. I don't think we need to tell pebble what we're executing - we just need a way to attach perf measurements to, say, an iterator. So something like this

// in batcheval.EvaluateFoo:
stats := somepool.Get().(*pebble.IterStats)
it := eng.NewIterator(pebble.IterOptions{Stats: stats}
doSomething(it)
metrics.RecordStats(stats)
*stats = pebble.IterStats{}
somepool.Put(stats)

I'm not in principal opposed to using a Context here, but it would be too allocation heavy anyway so I don't think it's a viable option.

sumeerbhola commented 2 years ago

We need support from pebble for that.

I think for the write path len(batch.Repr()) has always been sufficient. And for ingestion, the sstable length is known to the caller. And even the read path now has sufficient details in InternalIteratorStats.{BlockBytes,BlockBytesInCache} which are exposed via storage.IteratorStats. We really should do something about the write path (including raft application and rebalancing snapshots). It is so mystifying to look at an overloaded store and have no clear idea where the load was coming from (https://cockroachlabs.atlassian.net/browse/SREOPS-4396 is an example of this).

tbg commented 2 years ago

Gave this a quick prototype here: https://github.com/cockroachdb/cockroach/pull/79031

I agree that all that's needed is some polish and maybe a bit of refactoring, especially for the read metrics. Since the writes are more important, we could do those as a first pass. The one possible complication here is the added overhead of the repeated .Len() calls, where possibly we want to find a better solution. Once we have this data source, there is also going to be interest into exposing this via the range status and perhaps having a hot range dashboard that indexes on these new dimensions.

tbg commented 2 years ago

These metrics would also have been helpful for (internal) https://github.com/cockroachlabs/support/issues/1503 to determine why stores were filling up. We would've looked at the influx per range and would've likely noticed a particular TableID prefix receiving a large number of bytes in AddSSTable requests that could then have been associated with a job active on that table. And even just at first glance seeing lots of AddSSTable ingested by that node could've already prompted us to pausing all jobs, thereby addressing the immediate issue.

One question is whether we also ought to have below-raft metrics that make observable the influx of data on a per-range level. For example, of n1 is the leaseholder and proposes all of the ingestions, then n2 (a follower) wouldn't register anything on the metrics, since they reach it below raft. The prototype above touches upon this in a TODO, but if we went as far as replicating the metrics, we could apply them on followers as well. This amounts to saying that read metrics would be evaluation-time (as they should be) but write metrics being apply-time (or at least log-append-time, though implementing a log-append metrics on followers is annoying so we would probably sweep that difference under the rug). An argument for eval-time is that a long uncommitted raft log can see sustained writes but if nothing is ever applied, the metrics would give a false impression. The question remains what metrics would then be useful to pinpoint where the influx on a slow follower originates from. We need to think about this a little more.

tbg commented 2 years ago

also related to https://github.com/cockroachdb/cockroach/issues/71169 in a sense that that too asks for a more granular breakdown of work KV does (though that issue cares about latency).

cockroachdb / cockroach

kv,storage: request evaluation metrics #65414