Closed mitar closed 8 months ago
so that one could read the value incrementally instead of copying it whole into the memory first
The current Pebble implementation holds a whole decompressed sstable data block in-memory, so the value is already in-memory. In most cases, the pebble.Iterator
is returning a slice pointing to within that data block (so there isn't a second copy). The LazyValue
abstraction is currently only used for multiple versions in the multi-version concurrency control storage case (where older values are stored separately in value blocks), so that we can reduce IO when not reading those older versions (it was not meant to be a memory optimization).
The current Pebble implementation holds a whole decompressed sstable data block in-memory, so the value is already in-memory.
Yes, I am primarily thinking of this in the scenario when Pebble compression is disabled. The reason for this is because I would already be storing compressed values so I expect that there is little benefit for additional compression by Pebble. (I am planing to store values already compressed with different HTTP compression algorithms so that they are ready to be directly served to HTTP clients using content-encoding with compression.) And so if there is no compression and I would be streaming the value to the HTTP client, there might not be much need to first copy the whole value into the memory when it could be just mmaped from the disk to begin with.
As @jbowens mentioned in https://github.com/cockroachdb/pebble/issues/3191#issuecomment-1875807375 storing very large values in Pebble is not a goal for us, so there are no plans to try to avoid keeping a whole value in-memory.
I see. But I think this is not just about large values but also decreasing latency. If you first have to copy value into memory so that you can then copy it out, this increases the latency, if you could just directly stream it out.
not just about large values but also decreasing latency. If you first have to copy value into memory so that you can then copy it out, this increases the latency, if you could just directly stream it out.
I am not sure where the value is streaming out from -- directly from disk to the network? Anyway, synthetic benchmarks showing significant improvement are usually needed to make a compelling case -- otherwise such performance debates can often result in premature optimizations.
I am not sure where the value is streaming out from -- directly from disk to the network?
Yes. Like mmap a region of disk and then stream that to the network.
Anyway, synthetic benchmarks showing significant improvement are usually needed to make a compelling case -- otherwise such performance debates can often result in premature optimizations.
Sure, but I am probably not (yet) familiar with the Pebble codebase to be able to modify it enough to get this streaming proof-of-concept going to evaluate this. :-(
For large values, currently
.Get
returns a[]byte
slice. But I think it would be beneficial to have a way to return aReadSeekCloser
so that one could read the value incrementally instead of copying it whole into the memory first.My use case is to serve values using http.ServeContent which expects a
ReadSeeker
(andCloser
is needed to release the value back to Pebble).Maybe
.GetReader
could also be provided onDB
and others which have.Get
.(Of course this makes sense when there is no compression on data. If there is compression, providing such reader is trickier as it is hard to seek inside compressed stream.)
(With no compression it might make sense to also have no cache for data? Is it possible to disable data cache?)