Open arv opened 8 years ago
@cmasone-attic @kalman
It will make reading slower, but it has the advantage that you don't need to reserialize to get the hash.
How does buzhash factor into this?
On Tue, May 24, 2016 at 10:56 AM, Erik Arvidsson notifications@github.com wrote:
@cmasone-attic https://github.com/cmasone-attic @kalman https://github.com/kalman
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/attic-labs/noms/issues/1615#issuecomment-221347314
You're right that my PR relates to this...does it obviate the need for this? Are there cases that this would cover where we can't just pull the hash from the Chunk being decoded?
It would add the the hash for sub values too. The question is how often we need the hash for inlined values?
ah, my eyes are opened
I think we should prefer the other direction and not only lazily compute hashes, but lazily decode values. I.e. https://github.com/attic-labs/noms/issues/2270
There are places that caching the hash of a value will matter alot, such as sorting non-scalar values for insertion into collections, but I think those should be handled specially.
@arv, can we close this?
@rafael-atticlabs I don't see how lazily decoding values changes this? I can see that if we hold on to the chunk instead of creating the value and discarding the chunks, computing the hash will be a lot cheaper since we do not need to encode the chunk again.
With the binary encoding we can compute the hash of sub values as we read the data.
When we start reading a value we start a new Hash. As we read the data we both feed that into the current "hashers" and into the decoder. When a value is completed we already have the hash for that value and we can just set it on the object once and for all.
@rafael-atticlabs