rinde commented 8 months ago

Problem

As part of my work we do regular benchmarks and I found that replacing Vec with NEVec somewhere caused a noticable slowdown. I started investigating and created some micro benchmarks. I found that NEVec is considerably slower than Vec which seems to be related to how the non-empty iterator is implemented.

Solution

I created an alternative implementation called NEVec2 that only implements a couple methods for benchmarking. NEVec2 is a simple wrapper for Vec that can, by proper encapsulation guarantee non-emptiness. For the API of NonEmptyIterator2 I went for the design that I also proposed in https://github.com/fosskers/nonempty-collections/pull/16. Based on my (limited) experience with this new design I think that it would allow to make a lot of code much simpler because it can forward a most of the calls directly to the underlying Vec while still upholding the non-empty guarantee. I believe that this is also what keeps the performance on par with the underlying Vec.

Benchmark results

Results obtained with cargo bench:

vec                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ contains_nevec                 │               │               │               │         │
│  ├─ 1             1.304 ns      │ 3.475 ns      │ 1.321 ns      │ 1.43 ns       │ 100     │ 1000000
│  ├─ 8             8.483 ns      │ 13.34 ns      │ 9.462 ns      │ 9.516 ns      │ 100     │ 1000000
│  ├─ 64            23.59 ns      │ 31.54 ns      │ 25.65 ns      │ 25.77 ns      │ 100     │ 1000000
│  ╰─ 1024          21.17 ns      │ 25.84 ns      │ 22.21 ns      │ 22.36 ns      │ 100     │ 1000000
├─ contains_nevec2                │               │               │               │         │
│  ├─ 1             0.425 ns      │ 2.025 ns      │ 0.458 ns      │ 0.501 ns      │ 100     │ 1000000
│  ├─ 8             3.208 ns      │ 5.229 ns      │ 3.216 ns      │ 3.319 ns      │ 100     │ 1000000
│  ├─ 64            10.64 ns      │ 15.37 ns      │ 10.83 ns      │ 11.12 ns      │ 100     │ 1000000
│  ╰─ 1024          10.64 ns      │ 13.09 ns      │ 10.99 ns      │ 11.09 ns      │ 100     │ 1000000
├─ contains_vec                   │               │               │               │         │
│  ├─ 1             0.421 ns      │ 0.708 ns      │ 0.454 ns      │ 0.464 ns      │ 100     │ 1000000
│  ├─ 8             3.212 ns      │ 6.321 ns      │ 3.329 ns      │ 3.392 ns      │ 100     │ 1000000
│  ├─ 64            10.64 ns      │ 13.61 ns      │ 10.67 ns      │ 10.89 ns      │ 100     │ 1000000
│  ╰─ 1024          10.63 ns      │ 14.1 ns       │ 10.65 ns      │ 10.92 ns      │ 100     │ 1000000
├─ map_nevec                      │               │               │               │         │
│  ├─ 1             3.646 ns      │ 6.421 ns      │ 3.783 ns      │ 3.888 ns      │ 100     │ 1000000
│  ├─ 8             25.1 ns       │ 34.77 ns      │ 26.78 ns      │ 27.17 ns      │ 100     │ 1000000
│  ├─ 64            80.47 ns      │ 122.1 ns      │ 90.51 ns      │ 90.89 ns      │ 100     │ 1000000
│  ╰─ 1024          1.392 µs      │ 1.831 µs      │ 1.581 µs      │ 1.585 µs      │ 100     │ 1000000
├─ map_nevec2                     │               │               │               │         │
│  ├─ 1             18.82 ns      │ 25.47 ns      │ 20.02 ns      │ 20.26 ns      │ 100     │ 1000000
│  ├─ 8             21.99 ns      │ 27.78 ns      │ 22.57 ns      │ 23.04 ns      │ 100     │ 1000000
│  ├─ 64            41.69 ns      │ 76.03 ns      │ 48.86 ns      │ 50.1 ns       │ 100     │ 1000000
│  ╰─ 1024          666.8 ns      │ 866.3 ns      │ 716.6 ns      │ 721.5 ns      │ 100     │ 1000000
╰─ map_vec                        │               │               │               │         │
   ├─ 1             19.75 ns      │ 26.01 ns      │ 20.05 ns      │ 20.37 ns      │ 100     │ 1000000
   ├─ 8             21.87 ns      │ 29.22 ns      │ 22.49 ns      │ 22.88 ns      │ 100     │ 1000000
   ├─ 64            45.82 ns      │ 67.69 ns      │ 52.06 ns      │ 52.81 ns      │ 100     │ 1000000
   ╰─ 1024          656.8 ns      │ 862.5 ns      │ 717.3 ns      │ 737.3 ns      │ 100     │ 1000000

As can be seen above, the old NEVec performs up to 2x slower than NEVec2 and plain Vec. The exception is when the data size is 1. This is because NEVec keeps the first element in a local variable which means it is on the stack and not on the heap.

I expect that similar performance drawbacks exist for the other datastructures.

Note that these numbers will change from run to run and from machine to machine. The conclusions, however, seem to be consistent.

Moving forward

I hope you are convinced that this warrants a change in the design of the crate.

For the future the vec2.rs file can be entirely discarded as I just created it for benchmarking purposes. I think it makes sense to adapt the internals of the datastructures and iterators such it works similar as in vec2.rs. Before I start on any of that I would like to hear your opinion of course 😄

fosskers commented 7 months ago

In general I am open to redesigns that improve performance / usability as you've demonstrated here.

I'm going on an extended trip next week however, and will be gone until the end of May. I'd like to revisit this in detail then, since I can't devote the time to it this week. How does that sound?

rinde commented 7 months ago

I understand. That is unfortunate (for me) since I hoped to getting this fixed soon, but nice for you to go on a long trip 👍 . I will probably continue working in a separate PR and maybe create a temporary fork so that we can use it internally. Are you still planning a release before you leave that includes the arrays PR?

Have a nice trip!

fosskers commented 7 months ago

I've just released 0.2.5 with the array improvements.

I will probably continue working in a separate PR and maybe create a temporary fork so that we can use it internally.

Please feel free, although let's definitely continue collaboration once I return. I am keen for a better design, and these next few weeks could be a good opportunity for you to test the new design internally.

rinde commented 7 months ago

Thank you for the release!

Please feel free, although let's definitely continue collaboration once I return. I am keen for a better design, and these next few weeks could be a good opportunity for you to test the new design internally.

Yes that is indeed my intent. I will create an improved version and test it internally. When you're back we can discuss my findings and see how we can best continue our collaboration to further improve the library.

fosskers commented 7 months ago

Excellent, then I'll see you when I'm back! The reason I haven't released a 1.0 yet is that I wasn't 100% convinced of the design, and I'm glad I've let it stew for as long as I have. I've otherwise been using it in production for 6 months or so without issue.

fosskers / nonempty-collections

Slow performance of `NEVec` and `NonEmptyIterator` #21

Problem

Solution

Benchmark results

Moving forward