Open arv opened 6 years ago
@rafael-atticlabs WDYT?
This actually cropped up in a profile @rafael-atticlabs and I were looking at yesterday, too. Your point about compression is well-taken, so...I'm probably in favor of this overall? One question...in your work to back Values directly with encoded chunk data, is that data compressed or not? I assume not. I think maybe we'd want to use something smaller than uint64 to reduce the memory hit while the Values are resident in memory
A long time ago I was thinking to actually have multiple number encodings. We would use a different noms kind for numbers based on the actual value. We would pick the smallest one that we need...
I'm not convinced which is better though; always uintX or kind+num.
I'm not totally sold on ditching varints yet. At best uint32/64 will compress to 3 bytes. Most common values for our quantities compress to 1 byte -- and there are alot of them -- especially number of fields in this struct. I'd like to consider all of our options of reducing the decoding/encoding expense here. For example, I think we might be able to "fast forward" over varints with less expense than fully decoding them.
We use varints in a few "internal" places in the encoding for Value.
One reason we did this was because we thought varints would compress better.
Problems with using var ints
Suggestion:
Use uint64
It might actually be fine to use uint32 or even uint16 since prolly trees are log of the number of elements in the total collection but with snappy those extra zeros should be cheap enough that we should not care.