saik0 commented 2 years ago

Here's what I think is still left before next release

[x] #129
[x] #167
[x] #135
[x] #176
[x] #177
[x] quantify perf delta for stable
[x] quantify perf delta for nightly+simd
[x] manually verify docs
- [x] #199
- [x] #202
[x] create release notes
[x] #193
[x] #194
[x] #200
[x] #205
[x] ~#204~ (deferred to next release)

Release notes

Breaking changes
- Removed deprecated set ops such as union_with. Use the corresponding operators |=.
- MSRV increased to 1.56
- deserialize_from validates inputs. It some cases it can be 4x slower. For workloads that are heavy in deserialization from trusted sources migrate to deserialize_unchecked_from
Performance optimizations
- from_sorted_iter and append are exponentially faster. They should be preferred over collect and extend whenever adding monotonically increasing integers to the set as it's about 2-2.5x faster.
- Other performance optimizations. Min Mean Max
  
  Iteration 6% 57% 125%
  
  And 0% 7% 22%
  
  Or 0% 10% 33%
  
  Sub 0% 9% 39%
  
  Xor 4% 90% 209%
New features
- rank Returns the number of integers that are <= value. rank(u64::MAX) == len()
- select Returns the nth integer in the set or None if n <= len()
- union_len intersection_len ... and so on. Compute the cardinality of a set operation without materializing it. Usually about twice as fast as materializing the bitmap.
- implemented DoubleEndedIterator
- implemented ExactSizeIterator (on 64 bit targets only)
- EXPERIMENTAL SIMD feature (requires rust nightly) Min Mean Max
  
  And 0% 34% 141%
  
  Or 2% 45% 145%
  
  Sub 0% 51% 168%
  
  Xor 0% 130% 437%
Other
- Added property testing suite for set operations
- Added many benchmarks and real world datasets

Other performance optimizations.		Min	Mean	Max
Iteration	6%	57%	125%
And	0%	7%	22%
Or	0%	10%	33%
Sub	0%	9%	39%
Xor	4%	90%	209%

EXPERIMENTAL SIMD feature (requires rust nightly)		Min	Mean	Max
And	0%	34%	141%
Or	2%	45%	145%
Sub	0%	51%	168%
Xor	0%	130%	437%

Perf numbers are for pairwise operations on collections of bitmaps from real datasets. Pairwise as in: B₁ ∩ B₂, B₂ ∩ B₃, ... B_N-1 ∩ B_N

saik0 commented 2 years ago

@Kerollmops

Thinking out loud: A major version bump would be a good time to bump rust edition to 2021

Edit: Plus, we've already bumped MSRV to min supported ver for 2021 edition

saik0 commented 2 years ago

Here are my benchmark comparisons for v0.8.1 to 49455d44db4d4dd4ab09cdda2ec0385399292ce7

Environment

Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz. Hyperthreading and frequency scaling disabled. Base Arch Linux install. Linux 5.16.5-arch1-1

Analysis

On rust stable (scalar only)

The good

from_sorted_iter and append are exponentially faster
iteration can be as much as 125% faster [1]
Perf improved for all set ops [1]
- intersection can be as much as 22% faster
- union can be as much as 33% faster
- difference can be as much as 40% faster
- symmetric difference can be as much as 50% faster

[1] Depends on the dataset. Can be as little as zero

The bad

deserialize_into is between 60-400% slower with data validation. We are planning on calling this out.
container::len used to be field access, now it must branch on store type. This is a tradeoff we made to make cardinality tracking while flipping bits on bitmaps, which is part of why all the set ops are faster. However it has led to had some unintended downstream effects.
- RoaringBitmap::len is slower by about 60%.
- RoaringBitmap::serialize_into is about 10-20% slower
- Absolute difference for both of the above is on the order of tens of micros per dataset of 200 bitmaps each. I think this is a great trade off given the wins for set ops can be measured in millis and are much more likely to be on hot code paths than serializing for most use cases.

Data

critcmp.zip

Notes

Difference between old deserialize_into and new deserialize_into_unvalidated is within noise threshold. This is not absent from the data, as previously there was no deserialize_into_unvalidated benchmarks that could be compared automatically. I had to validate this manually.
is_disjoint being ~15% slower appears to have just been noise. I wasn't able to reproduce this over multiple runs

Analysis for SIMD coming later, but the data is in the zip if you want to peek

saik0 commented 2 years ago

SIMD

intersection can be as much as 140% faster
union can be as much as 145% faster
difference can be as much as 168% faster
symmetric difference can be as much as 437% faster. mostly because #191

saik0 commented 2 years ago

@Kerollmops Time to get rid of union_with() (and others) deprecated in 0.6.7? (April 2021)

Kerollmops commented 2 years ago

Time to get rid of union_with() (and others) deprecated in 0.6.7? (April 2021)

You are right, I created issue #200 about that thank you for the reminder.

Here are my benchmark comparisons for v0.8.1 to https://github.com/RoaringBitmap/roaring-rs/commit/49455d44db4d4dd4ab09cdda2ec0385399292ce7

I am so much impressed by what you've done and it is only the benchmarks for x64_86 🎉

About the scalar version, you are right: the perf losses aren't much important as they are under the millisecond! About the SIMD version, that's impressive too! I was sure your work would break the law of physics 🚄 🚀

Once we release the new version of roaring I will create a GitHub release to talk about the work you have done, featuring Meilisearch just for its fame and the indirect help you bring to this project. We are so much awaiting this release of Roaring on the Meilisearch side, just to see those flamegraphs melt 🧊

Meilisearch and other projects intensively using roaring would also gain speed by using the new multi-ops API, however, it can be released in a future version as it will not be breaking!

saik0 commented 2 years ago

I am so much impressed by what you've done and it is only the benchmarks for x64_86 🎉

Thanks. I'm curious to see how the array_array SIMD does on other platforms

We are so much awaiting this release of Roaring on the Meilisearch side, just to see those flamegraphs melt

😂

I'm pleased to contribute to OSS again, it's been too long.

Meilisearch and other projects intensively using roaring would also gain speed by using the new multi-ops API, however, it can be released in a future version as it will not be breaking!

That reminds me. I'm going to create a planning ticket for next_version++

saik0 commented 2 years ago

@Kerollmops I kicked out #204. It can wait.

Do you think I should try to do a quick fix for #191 since the theme of this release is: 🚀

Kerollmops commented 2 years ago

Do you think I should try to do a quick fix for #191 since the theme of this release is: 🚀

You can indeed if you have already a good idea of how you would speed this up!

saik0 commented 2 years ago

You can indeed if you have already a good idea of how you would speed this up!

Just get rid of the in place operation and force an allocation like we have with union.

saik0 commented 2 years ago

@Kerollmops Release notes contains perf from unmerged xor change. AFAICT master is ready to ship as 0.9 once merged

saik0 commented 2 years ago

@Kerollmops Updated release notes for #187. Are we ready to bump the version and release?

Kerollmops commented 2 years ago

Yeah, I think we are ready to do so now. You are right that even if there are no breaking changes we should bump the version as we drastically impacted the performances of deserialize_from. I will do this either today or tomorrow.

Thank you for reminding me!

Kerollmops commented 2 years ago

Hum... @saik0,

I have a small issue when I try to publish the crate on crates.io: it seems that we can't publish a crate that depends on a git repository and that this dependency doesn't define the version of that crate that has been published to crates.io.

However, core_simd, doesn't seem to be available on crates.io.

$ cargo publish --dry-run
Updating crates.io index
error: all dependencies must have a version specified when publishing.
dependency `core_simd` does not specify a version
Note: The published dependency will use the version from crates.io,
the `git` specification will be removed from the dependency declaration.

RoaringBitmap / roaring-rs

0.9.0 Tracking Issue #178

Release notes

Environment

Analysis

The good

The bad

Data

Notes

SIMD