Closed sumeerbhola closed 2 years ago
Most of the bypassing requests are TruncateLog. These end up synchronized in this workload across the 1000+ ranges because the writes are randomly distributed across all ranges. I suspect this is less of a problem in the real world. And I don't have good simple ideas on how to fix this. One simple idea is to put an arbitrary bound (say 500 bytes) on the tokens consumed by requests that bypass admission control, even if the estimate is higher. But then we have a risk that if we have situations where 500 bytes is too low and many requests are bypassing admission control, that we will give out too many tokens. If we then adjust the estimate based only on requests that did not bypass admission control, it will compensate for the next cycle, but then if no bypass requests are received it will overcompensate. Essentially, it becomes tricky if the number of bypassing requests received per interval have huge fluctuations, which is what happens with these TruncateLog requests.
So I plan to do nothing unless we see this as a problem in real settings.
Came up with a better idea when working on https://github.com/cockroachdb/cockroach/pull/82813 We would use an estimate at admission time as usual, but when the work is done we would fix it by calling
func (q *StoreWorkQueue) AdmittedWorkDone(h StoreWorkHandle, doneInfo StoreWorkDoneInfo) error
where StoreWorkDoneInfo
is defined as
type StoreWorkDoneInfo struct {
// For ingests, ActualBytes is the size of the sstables. For normal writes,
// it is the size of the batch. If StoreWriteWorkInfo.WriteBytes > 0, it
// must be equal to ActualBytes (that is the case where the bytes were known
// at admission time).
ActualBytes int64
// ActualBytesIntoL0 <= ActualBytes. For normal writes this is the equality
// relationship. For ingests, these are the (approximate) bytes that were
// ingested into L0.
ActualBytesIntoL0 int64
}
So we fix the estimation after request evaluation. We would also use this to eliminate the fractionOfIngestIntoL0
estimation.
This issue encompasses estimation of writes at followers (regular writes and ingests), without which both our token estimation and token consumption becomes flawed.
The existing size estimation logic uses the same write bytes adjustment for all requests. This means that when there are tiny writes that bypass admission control, they mistakenly consume all tokens. The following is an example of a kv0 workload using 64KB blocks, and turning off admission control and then turning it back on after sublevel count was > 90. Even though there are a substantial number of byte tokens, none of them are being given out to regular requests. Also, the first estimation of +3.3 MiB/req is too high since the admission control accounting by the WorkQueue was off for much of the interval, so the requests have been undercounted.
Jira issue: CRDB-16503