Analyze varints and zigzag encoding - Githubissues

TWKB / Specification

An effort to come up with a compressed binary vector geometry specification

86 stars 10 forks source link

Analyze varints and zigzag encoding #1

Closed olt closed 11 years ago

olt commented 11 years ago

The 0.01 draft needs at least three bytes for all values above 128. Varint encoding as used by protobuf only uses two bytes for values between 128 and 16384.

https://developers.google.com/protocol-buffers/docs/encoding

nicklasaven commented 11 years ago

Well, the first one will need three because you need one byte to give the new size. But the coordinates after that uses only 2 bytes until next change.

nicklasaven commented 11 years ago

That link was very interesting about zig zag encoding. But if I understand right you sacrify one bit to tell there is a next byte. Is that correct? Then one byte will only give you the range from -64 to 63. Or do I miss the point?

olt commented 11 years ago

Ok, I missed that the size is for all following coordinates.

For signed integers it is right that on byte can store from -63 to 64 (IIRC). But you always store numbers in the smallest possible way and do not need to change the sizes back and forth.

nicklasaven commented 11 years ago

Yes, you are right. Very interesting. Here I think it comes down to statistics about real world data. What I have noticed is that it is quite big differences in overview maps and more detailed maps. More detailed maps often have only a few meters between the points and hen almost all vertex points is stored as 1 byte even with a precision of 1 decimal. With 1 decimal it can hold values from -12.7 to + 12.7.

olt commented 11 years ago

Yeah, only a comparison with real world data will show which method is better. But varints could be used for all numbers (IDs and also first coordinates) and should benefit for small geometries.

nicklasaven commented 11 years ago

Yes, that is an important point about ID and first coordinate.

Another question is what is fastest to parse in javascript, since that is another bottle neck.

hmm, it just struck me. With this varint, you will sacrifice one bit per byte, not only for the whole value. so 16 bits will only give -8191 to +8191 instead of -32767 to +32767 with INT16

nicklasaven commented 11 years ago

I have added varInt as mthod 1 in the spec and very soon in the PostGIS and javascript implementations.

It looks very promising :-)

nicklasaven commented 11 years ago

VarInt is now the one and only method in TWKB spec and PostGIS implementation