RoaringBitmap / roaring

Roaring bitmaps in Go (golang), used by InfluxDB, Bleve, DataDog
http://roaringbitmap.org/
Apache License 2.0
2.52k stars 230 forks source link

`.RunOptimize()` is not idempotent #440

Closed tatyana-322 closed 3 months ago

tatyana-322 commented 3 months ago

In the cases below, .RunOptimize() always changes container from runContainer16{iv:interval16} to arrayContainer. If applied again, it will change arrayContainer back to runContainer16{iv:interval16} and so on.

My encoding workflow always calls .RunOptimize() before .MarshalBinary(). .MarshalBinary() result differs for arrayContainer and for runContainer16{iv:interval16}. Therefore, on EACH encoding workflow call, binary result will be different for the same contents in bitmap.

Do you mind to make .RunOptimize() idempotent ?

Case 1 with github.com/RoaringBitmap/roaring/v2 v2.3.1 :

import (
    "testing"

    "github.com/RoaringBitmap/roaring/v2"
    "github.com/stretchr/testify/require"
)

func TestBmap(t *testing.T) {
    a := roaring.NewBitmap()
    a.AddMany([]uint32{1, 2, 3})
    a.RunOptimize()
    b1, err := a.MarshalBinary()
    require.NoError(t, err)
    a.RunOptimize()
    b2, err := a.MarshalBinary()
    require.NoError(t, err)
    require.Equal(t, b1, b2) // <- test failed here
}

Case 2 with github.com/RoaringBitmap/roaring v1.9.4:

import (
    "testing"

    "github.com/RoaringBitmap/roaring"
    "github.com/stretchr/testify/require"
)

func TestBmap(t *testing.T) {
    a := roaring.NewBitmap()
    a.AddMany([]uint32{1, 2, 3})
    a.RunOptimize()
    b1, err := a.MarshalBinary()
    require.NoError(t, err)
    a.RunOptimize()
    b2, err := a.MarshalBinary()
    require.NoError(t, err)
    require.Equal(t, b1, b2) // <- test failed here
}
lemire commented 3 months ago

Looks like a bug. :-/

lemire commented 3 months ago

Yes. It is a bug.

lemire commented 3 months ago

We will issue a patch release soon.

lemire commented 3 months ago

Please update to the latest version.

tatyana-322 commented 3 months ago

@lemire after patch it works for case a.AddMany([]uint32{1, 2, 3}), but if you check same test but with a.AddMany([]uint32{1, 2, 3, 4}) - it will fail. I've checked similar sequences up to 10000, test fails for 4, 5, 6, 7, 8, 9, 10, 11, 12, 13

lemire commented 3 months ago

@tatyana-322 Thanks. The fix was not complete. I am issuing a new release.

tatyana-322 commented 3 months ago

@lemire on latest update (2.3.3) it started to fail on similar sequences ending with x >= 14: {1..14}; {1..15} ; ... ; {1..10000}; ...

lemire commented 3 months ago

@tatyana-322 You are correct.

lemire commented 3 months ago

See https://github.com/RoaringBitmap/roaring/releases/tag/v2.3.4