Closed erikgrinaker closed 1 year ago
@erikgrinaker, can you confirm what your expectations are for the other values? What would you expect to be accumulated vs not accumulated from across each node? Currently, everything is accumulated.
I'd expect to see the range's logical pre-replication stats, basically the range's MVCC stats that you get from crdb_internal.range_stats()
. We care about how many logical keys exist in the range, and their logical size.
Also, only the leaseholder is guaranteed to have the right values. Followers can be arbitrarily stale, and can't be relied on.
That's fair enough. Is there ever a use-case for what currently exists? I'm wondering how the current implementation got this far without anyone realizing until now. cc @knz
If I need to backtrack, that's ok. The fan-out machinery is good, but it needs to be reconfigured to only visit the leaseholders of the ranges that compose a requested span.
Is there ever a use-case for what currently exists?
It seems kind of misleading to me. I don't think anyone really cares about post-replication MVCC stats. For the disk usage stats, one might want to get the actual physical sum across replicas (if that's what the user cares about), but we have to be very clear about pre/post-replicated stats and logical/physical stats.
I touched briefly on this on a different issue, where I suggested fetching this information via a new KV RPC call, which would always route to the leaseholder and deal with retries and outages and stuff like that.
The issue was identified previously here: #108779 This issue Erik you have identified is only a symptom of it. However, it is a "serious" symptom - it wasn't clear when I enumerated the concerns in #108779 that we cared about SHOW RANGES returning stale mvcc stats.
I don't think the sum issue is covered there, but the stale stats definitely is. Or maybe this covers it:
it seems strange we hit all nodes with at least one replica. Should we not restrict the fanout to the leaseholders only?
This was intended to cover it:
only data from local replicas is collected, which may be stale
The main problem here is the summing of stats across replicas. The stale stats is also an issue, but staleness is sometimes a problem, while summing is always a problem.
The main problem here is the summing of stats across replicas. The stale stats is also an issue, but staleness is sometimes a problem, while summing is always a problem.
This feels like a reasonable threshold for release blocker / not release blocker.
key_count: 3000000
is not correct, there's 1 million keys in the range, not 3 million. Seems like it's adding up results from all replicas or something.@zachlite Is this related to your recent span stats work?
Jira issue: CRDB-30628 Epic: CRDB-30635