cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

storage: consider removing readahead in MVCCIterate #66828

Open erikgrinaker opened 3 years ago

erikgrinaker commented 3 years ago

In #66682 we saw significant performance problems in Raft causing constant elections and stuck ranges. It turned out this was caused by the 1000 key readahead in MVCCIterate:

https://github.com/cockroachdb/cockroach/blob/8f5231d3443f372c88632c8b77c7d954890cc334/pkg/storage/mvcc.go#L2662-L2666

The caller had passed an iterator function that was supposed to terminate early (in that case after the first entry), but this is only called after the 1000 entries have been fetched:

https://github.com/cockroachdb/cockroach/blob/8f5231d3443f372c88632c8b77c7d954890cc334/pkg/storage/mvcc.go#L2679-L2686

In #66816 @nvanbenschoten notes:

I considered removing the readahead behavior in MVCCIterate entirely. I think it might have been added to avoid repeat CGo hops back when we had RocksDB. But now that we have Pebble and can manipulate an iterator cheaply, it can maybe be rejected. However, this would then require us to lift some logic from pebbleMVCCScanner into MVCCIterate, which I wasn't keen on doing.

@petermattis adds:

+1 to doing this. Or to removing MVCCIterate entirely. Now that we don't have to minimize cgo crossings I don't see the point of MVCCIterate vs using an MVCCIterator.

Jira issue: CRDB-8248

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!