RoaringBitmap / roaring-rs

A better compressed bitset in Rust
https://docs.rs/roaring/
Apache License 2.0
755 stars 83 forks source link

0.9.0 Tracking Issue #178

Open saik0 opened 2 years ago

saik0 commented 2 years ago

Here's what I think is still left before next release

Release notes

Perf numbers are for pairwise operations on collections of bitmaps from real datasets. Pairwise as in: B1 ∩ B2, B2 ∩ B3, ... BN-1 ∩ BN

saik0 commented 2 years ago

@Kerollmops

Thinking out loud: A major version bump would be a good time to bump rust edition to 2021

Edit: Plus, we've already bumped MSRV to min supported ver for 2021 edition

saik0 commented 2 years ago

Here are my benchmark comparisons for v0.8.1 to 49455d44db4d4dd4ab09cdda2ec0385399292ce7

Environment

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz. Hyperthreading and frequency scaling disabled. Base Arch Linux install. Linux 5.16.5-arch1-1

Analysis

On rust stable (scalar only)

The good

  1. from_sorted_iter and append are exponentially faster
  2. iteration can be as much as 125% faster [1]
  3. Perf improved for all set ops [1]
    • intersection can be as much as 22% faster
    • union can be as much as 33% faster
    • difference can be as much as 40% faster
    • symmetric difference can be as much as 50% faster

[1] Depends on the dataset. Can be as little as zero

The bad

  1. deserialize_into is between 60-400% slower with data validation. We are planning on calling this out.
  2. container::len used to be field access, now it must branch on store type. This is a tradeoff we made to make cardinality tracking while flipping bits on bitmaps, which is part of why all the set ops are faster. However it has led to had some unintended downstream effects.
    • RoaringBitmap::len is slower by about 60%.
    • RoaringBitmap::serialize_into is about 10-20% slower
    • Absolute difference for both of the above is on the order of tens of micros per dataset of 200 bitmaps each. I think this is a great trade off given the wins for set ops can be measured in millis and are much more likely to be on hot code paths than serializing for most use cases.

Data

critcmp.zip

Notes

Analysis for SIMD coming later, but the data is in the zip if you want to peek

saik0 commented 2 years ago

SIMD

saik0 commented 2 years ago

@Kerollmops Time to get rid of union_with() (and others) deprecated in 0.6.7? (April 2021)

Kerollmops commented 2 years ago

Time to get rid of union_with() (and others) deprecated in 0.6.7? (April 2021)

You are right, I created issue #200 about that thank you for the reminder.

Here are my benchmark comparisons for v0.8.1 to https://github.com/RoaringBitmap/roaring-rs/commit/49455d44db4d4dd4ab09cdda2ec0385399292ce7

I am so much impressed by what you've done and it is only the benchmarks for x64_86 🎉

About the scalar version, you are right: the perf losses aren't much important as they are under the millisecond! About the SIMD version, that's impressive too! I was sure your work would break the law of physics 🚄 🚀

Once we release the new version of roaring I will create a GitHub release to talk about the work you have done, featuring Meilisearch just for its fame and the indirect help you bring to this project. We are so much awaiting this release of Roaring on the Meilisearch side, just to see those flamegraphs melt 🧊

Meilisearch and other projects intensively using roaring would also gain speed by using the new multi-ops API, however, it can be released in a future version as it will not be breaking!

saik0 commented 2 years ago

I am so much impressed by what you've done and it is only the benchmarks for x64_86 🎉

Thanks. I'm curious to see how the array_array SIMD does on other platforms

We are so much awaiting this release of Roaring on the Meilisearch side, just to see those flamegraphs melt

😂

I'm pleased to contribute to OSS again, it's been too long.

Meilisearch and other projects intensively using roaring would also gain speed by using the new multi-ops API, however, it can be released in a future version as it will not be breaking!

That reminds me. I'm going to create a planning ticket for next_version++

saik0 commented 2 years ago

@Kerollmops I kicked out #204. It can wait.

Do you think I should try to do a quick fix for #191 since the theme of this release is: 🚀

Kerollmops commented 2 years ago

Do you think I should try to do a quick fix for #191 since the theme of this release is: 🚀

You can indeed if you have already a good idea of how you would speed this up!

saik0 commented 2 years ago

You can indeed if you have already a good idea of how you would speed this up!

Just get rid of the in place operation and force an allocation like we have with union.

saik0 commented 2 years ago

@Kerollmops Release notes contains perf from unmerged xor change. AFAICT master is ready to ship as 0.9 once merged

saik0 commented 2 years ago

@Kerollmops Updated release notes for #187. Are we ready to bump the version and release?

Kerollmops commented 2 years ago

Yeah, I think we are ready to do so now. You are right that even if there are no breaking changes we should bump the version as we drastically impacted the performances of deserialize_from. I will do this either today or tomorrow.

Thank you for reminding me!

Kerollmops commented 2 years ago

Hum... @saik0,

I have a small issue when I try to publish the crate on crates.io: it seems that we can't publish a crate that depends on a git repository and that this dependency doesn't define the version of that crate that has been published to crates.io.

However, core_simd, doesn't seem to be available on crates.io.

$ cargo publish --dry-run
Updating crates.io index
error: all dependencies must have a version specified when publishing.
dependency `core_simd` does not specify a version
Note: The published dependency will use the version from crates.io,
the `git` specification will be removed from the dependency declaration.