cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.98k stars 458 forks source link

db: consider adding IterContext #4115

Open jbowens opened 1 month ago

jbowens commented 1 month ago

The iterator stack has a host of configuration and global state. Today a lot of this configuration exists outside the base.InternalIterator interface. This forces iterator construction to propagate this configuration, every internal iterator in the stack to save a copy, and every internal iterator to zero this state on Close. Iterator.Close has an unfortunately high cost on workloads that use an iterator for a one-time seek (#4049) in part due to zeroing these structures.

We might reduce CPU and reduce the memory used by an iterator by propagating some of this information down the iterator stack during an operation by pointer. For example, we could define a base.IterContext struct:

type IterContext struct {
    Compare    Compare
    Equal      Equal
    Split      Split
    LowerBound []byte
    UpperBound []byte
    Prefix     []byte
    Stats      InternalIteratorStats
    Context    context.Context
        Logger     Logger
        Comparer   Comparer
}

Every iteration method of the InternalIterator interface could accept a *IterContext as the first parameter, which would point into a field of the pebble.Iterator struct when iterating using a top-level pebble.Iterator. With a quick survey, it looks like we would be able to remove:

There are some tricky bits where global state diverges from an individual iterator's state. For example the levelIter avoids propagating bounds if a sstable falls wholly within the bounds. I think we'd still see benefit because, for example, some iterators maintain both global-level bounds and bounds to use within a more constrained iteration context.

Additionally, we might see some benefit to keeping frequently accessed fields (cmp, split) shared by improving cache locality.

Also propagating the iteration prefix through Next will facilitate #3794, because today leaf iterators don't retain the iteration prefix.

Jira issue: PEBBLE-289

RaduBerinde commented 1 month ago

Maybe we can somehow tie the valueBlockReader allocation to the IterContext too - you would only be able to read a lazy value if the IterContext is still valid.