RoaringBitmap / RoaringFormatSpec

Specification of the compressed-bitmap Roaring format
http://roaringbitmap.org/
Apache License 2.0
148 stars 14 forks source link

code/data sample do not conform to the offset map sepc #2

Closed glycerine closed 7 years ago

glycerine commented 7 years ago
  1. Offset header If and only if the cookie took value SERIAL_COOKIE and there are at least NO_OFFSET_THRESHOLD, then we store for each container (using a 32-bit value) to location (in bytes) of the container from the beginning of the stream (starting with the cookie).

the github.com/RoaringBitmap/roaring/testdata/bitmapwithoutruns.bin appears to violate the "if-and-only-if" part of the spec. serialCookie is not present, and yet the offset map is still present.

!serialCookie, so after skip of isRun bitmap, and read of size, now at pos = 8
....
skipping/should be no offset header, because !serialCookie  pos is now 52

at key 0, filepos 52, reading an array container with card 66

at key 0, filepos 52, read an array container (card 66) = {96, 0, 228, 0, 296, 0, 8488, 0, 16680, 0, 24872, 0, 33064, 0, 41256, 0, 48040, 0, 56232, 0, 64424, 0, 0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000, 21000, 22000, 23000, 24000, 25000, 26000, 27000, 28000, 29000, 30000, 31000, 32000, 33000, 34000, 35000, 36000, 37000, 38000, 39000, 40000, 41000, 42000, 4\
3000, }
glycerine commented 7 years ago

(This is why the TestSerializationFromJava test was failing; because I implemented the spec but the other generators apparently do not.) You probably just want to update the spec.

lemire commented 7 years ago

Let me check.

lemire commented 7 years ago

You are right, my prose was buggy... Here is the correct specification:

https://github.com/RoaringBitmap/RoaringFormatSpec/blob/master/README.md#3-offset-header