Open gedw99 opened 2 years ago
Yes, this looks very promising, but I also have some concerns.
Because Bluge directly uses the roaring bitmap inside our ice segment file format, we are sensitive to any changes in the disk format. But, this is somewhat mitigated by the fact that roaring bitmaps adheres to a disk format specification: https://github.com/RoaringBitmap/RoaringFormatSpec/
Will sroar offer any guarantees about file format stability, or how their version number will relate to file format changes? Maybe they will, I just couldn't find anything about this. Related to this, because the sroar shares a common disk/mem format, it just seems more likely that some future bug fix or improvement ends up altering the disk format as well.
Also, while the blog post covers all the good parts, looking at the github repo, there are several things they have skipped or omitted from their implementation. Of course they claim those aren't important bits, but we have to validate that for ourselves.
But, certainly the improvements are promising, and we should explore this further.
thanks @marty for having a look
It got them about a 30% speed increase just from this change.
"Will sroar offer any guarantees about file format stability ?" yes good point about binary compatibility. Wonder if the added a version field to the on disk format, so the runtime can sniff the version ? you just need to have multi version goang lib and then all good.
One problem we have is that the bluge_segment_api has a dependency on the roaring.Bitmap
struct directly. This was already an issue that worried us, and we wanted to rely on an interface instead. I had drafted a PR to update Bleve to fix this, and it should be a little easier in Bluge.
Also, I heard from @sreekanth-cb who looked at it from the Bleve perspective and he shared that a few high-level methods we use were missing (like HasNext, AdvanceIfNeeded). We should do a more thorough examination and go back to them with the list.
Thanks @mschoch for looking into this.
A perf increase will really help adoption I feel if of course this approach bears fruit
if push cones to show , can always add a migration to sniff the old on disk format and force an upgrade to the newer version . Just like a db typically does. O just another option . Of course doing a migration would be a matter of still having the old version in code .
https://github.com/dgraph-io/sroar
Background explainer 👍
https://dgraph.io/blog/post/serialized-roaring-bitmaps-golang/