RoaringBitmap / RoaringFormatSpec

Specification of the compressed-bitmap Roaring format
http://roaringbitmap.org/
Apache License 2.0
148 stars 14 forks source link

code doesn't match spec #1

Closed glycerine closed 7 years ago

glycerine commented 7 years ago

The 16 least significant bits of the 32-bit cookie have value SERIAL_COOKIE. In that case, the 16 most significant bits of the 32-it cookie are used to store the number of containers minus 1. That is, if you shift right by 16 the cookie and add 1, you get the number of containers. Let size be the number of containers. Then we store (size + 7) / 8 bytes, following the initial 64 bits,

The sample Java code starts the isRun bitmap immediately after the 32-bit cookie (so after the first 32-bits), whereas the spec, in the paragraph quoted above, says it should come after the first 64-bits.

    int startOffset = 0;
    boolean hasrun = hasRunContainer();
    if (hasrun) {
      out.writeInt(Integer.reverseBytes(SERIAL_COOKIE | ((size - 1) << 16)));
      byte[] bitmapOfRunContainers = new byte[(size + 7) / 8];
      for (int i = 0; i < size; ++i) {
        if (this.values[i] instanceof RunContainer) {
          bitmapOfRunContainers[i / 8] |= (1 << (i % 8));
        }
      }
      out.write(bitmapOfRunContainers);
lemire commented 7 years ago

Fixed in https://github.com/RoaringBitmap/RoaringFormatSpec/commit/50ab61f1921dda8d758bae8bcd665dd40cd8ed44

The cookie header in this case spans 32 bits.