cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.78k stars 444 forks source link

LazyValue could provider a method which returns a ReadSeekCloser #3192

Closed mitar closed 8 months ago

mitar commented 8 months ago

For large values, currently .Get returns a []byte slice. But I think it would be beneficial to have a way to return a ReadSeekCloser so that one could read the value incrementally instead of copying it whole into the memory first.

My use case is to serve values using http.ServeContent which expects a ReadSeeker (and Closer is needed to release the value back to Pebble).

Maybe .GetReader could also be provided on DB and others which have .Get.

(Of course this makes sense when there is no compression on data. If there is compression, providing such reader is trickier as it is hard to seek inside compressed stream.)

(With no compression it might make sense to also have no cache for data? Is it possible to disable data cache?)

sumeerbhola commented 8 months ago

so that one could read the value incrementally instead of copying it whole into the memory first

The current Pebble implementation holds a whole decompressed sstable data block in-memory, so the value is already in-memory. In most cases, the pebble.Iterator is returning a slice pointing to within that data block (so there isn't a second copy). The LazyValue abstraction is currently only used for multiple versions in the multi-version concurrency control storage case (where older values are stored separately in value blocks), so that we can reduce IO when not reading those older versions (it was not meant to be a memory optimization).

mitar commented 8 months ago

The current Pebble implementation holds a whole decompressed sstable data block in-memory, so the value is already in-memory.

Yes, I am primarily thinking of this in the scenario when Pebble compression is disabled. The reason for this is because I would already be storing compressed values so I expect that there is little benefit for additional compression by Pebble. (I am planing to store values already compressed with different HTTP compression algorithms so that they are ready to be directly served to HTTP clients using content-encoding with compression.) And so if there is no compression and I would be streaming the value to the HTTP client, there might not be much need to first copy the whole value into the memory when it could be just mmaped from the disk to begin with.

sumeerbhola commented 8 months ago

As @jbowens mentioned in https://github.com/cockroachdb/pebble/issues/3191#issuecomment-1875807375 storing very large values in Pebble is not a goal for us, so there are no plans to try to avoid keeping a whole value in-memory.

mitar commented 8 months ago

I see. But I think this is not just about large values but also decreasing latency. If you first have to copy value into memory so that you can then copy it out, this increases the latency, if you could just directly stream it out.

sumeerbhola commented 8 months ago

not just about large values but also decreasing latency. If you first have to copy value into memory so that you can then copy it out, this increases the latency, if you could just directly stream it out.

I am not sure where the value is streaming out from -- directly from disk to the network? Anyway, synthetic benchmarks showing significant improvement are usually needed to make a compelling case -- otherwise such performance debates can often result in premature optimizations.

mitar commented 8 months ago

I am not sure where the value is streaming out from -- directly from disk to the network?

Yes. Like mmap a region of disk and then stream that to the network.

Anyway, synthetic benchmarks showing significant improvement are usually needed to make a compelling case -- otherwise such performance debates can often result in premature optimizations.

Sure, but I am probably not (yet) familiar with the Pebble codebase to be able to modify it enough to get this streaming proof-of-concept going to evaluate this. :-(