AngelicosPhosphoros / keyed_priority_queue

MIT License
27 stars 5 forks source link

HashMap updates is a huge perf drawback #3

Closed Ten0 closed 4 years ago

Ten0 commented 4 years ago

In order to avoid doing log(n) calls to the hashmap when updating the priorities, the priority_queue library has an intermediate vector for mapping indexes of the map to indexes of the heap (here). This allows for a much cheaper O(1) update (Vec instead of HashMap).

However, I like the idea of having more stuff public (why didnt priority_queue make their iterators public?), using safe code, and splitting the heap implementation from the mapping storage. (Is that why you've decided to recreate it?) But since the performance is better from having an intermediate Vec, it'd still probably be better to have one inside the BinaryHeap structure.

I am currently looking for a library that allows for updatable priorities when all my data is already remapped to indexes, and I don't want to pay the additional cost of the HashMap. (Similarly to what I have implemented in C++ here https://github.com/Ten0/updatable_priority_queue). If BinaryHeap could be exposed with the previouly mentionned Vec-inside optimization and KeyedPriorityQueue was just a wrapper around it for doing the remapping with a HashMap, you would gain performance when using the HashMap and provide room for custom remappings that may be more performant than the HashMap in some cases.

Another trick you can do to divide by two the number of writes during a heapify operation is to not swap, but instead keep the element you're trying to move down on the side, and only move up the elements as you go down, and then finally place your element at the bottom. (Easier to do once all your elements have Clone because they are usizes because you've got internal remapping.)

As a side note, I'm very surprised you need to unsafe impl Sync & Send on your structures, normally the compiler would figure these out itself provided that everything that is contained has Sync and Send. If that's indeed the case, you should never impement these yourself unless it's not automatically derived by the compiler, you understand why it's not derived by the compiler, and you have very good reason to know that your implementation guarantees memory safety despite the reasons why the compiler did not derive it automatically.

Ten0 commented 4 years ago

I've forked the lib at Ten0/keyed_priority_queue and implemented these. Here's my first draft of benchmarking on randomly-generated inserts and updates:

AngelicosPhosphoros commented 4 years ago

Hello @Ten0 ! Thank you for your contribution! Sorry for long lack of response: I had very little time to setup detailed benchmarks. And optimizing without measures can be harmful so I postponed optimizations so far. I recreated this keyed_priority_queue because I failed to find the priority_queue library :D

Now I have set up my benchmarks and I am ready to work. Benchmarking code is available in develop branch now. So about your enhancement suggestions: 1) Using internal map of usizes to usizes as Vec. It is very good point, however it very complicate the code. I had measured it and found that this optimization provide great boost: 50%. I must implement this and make next release. 2) Using unsafe to decrease writes. I found this not very effective. It improve benchmark results by 1-3% which is not worth introducing unsafe, at least at current stage. 3) unsafe impl Sync & Send: I wasn't know that it is deduced by compiler. Thank you!

If BinaryHeap could be exposed with the previouly mentionned Vec-inside optimization...

OK, I will keep this in mind during reimplementing but I can't promise anything. Maybe it will be added in version 0.2.0.

About your fork: It cannot be merged now because it has too big changes in API and uses IndexMap which add O(n) complexity on some current O(log n) operations.

Thank you very much for your contribution!

Ten0 commented 4 years ago

Hello! :)

I recreated this keyed_priority_queue because I failed to find the priority_queue library :D

Well its much nicer to use, doesn't have thousands of useless unsafe lines, and I would have recreated it from scratch to get my performant version if I didn't find yours ^^

About your comments :

  1. I'm not sure we have the same implementation in mind for that Vec remapping:
    • From my benchmarks, my version is actually 75% faster than the original implementation, not 50 (50 is when comparing to priority_queue), and increases by another 33% when remapping through index map is not needed because keys are already usizes.
    • I did not find it to very much complicate the code because it removes the update_fn complexity.
    • I see the global architecture as follows:
    • An updatable heap without generic code to handle multiple type of keys, and without update_fn (just contains the two Vecs.
    • A wrapper around that heap that enables remapping from and to any type of key. I am under the impression that indexmap is exactly what corresponds to this problem.
  2. I didn't have time to benchmark it and I agree that is not worth introducing unsafe. Anyway, had to be implemented to be benchmarked :) I'll remove it from my fork as well.

About your fork, it cannot be merged now because it has too big changes in the API and uses IndexMap

I was aware my changes were not PR-ready yet, which is why I did not open a PR. (in particular I did not update the documentation). However:

Can you enlighten me on these two points?

AngelicosPhosphoros commented 4 years ago

Hi, Ten0!

  1. About API I would prefer keep some std heap and hashmap like API. The main changes I want to introduce is a no panics and Entry API. By no panics I mean that queue must not panic on invalid keys, much better will be just return Result. By Entry API I mean something like HashMap Entry API. I will just wrap hash map entry to allow optional pushes with just one hash map lookups. It allows more flexible interaction than yours _opt methods since user can make any comparison with value from returned entry.

The other possible thing is making usize indexed version.

  1. Removing key from the middle of indexmap costs O(n) but nevermind: I have realized how it can be handled nicely just now, during writing this message :D In your version you just don't remove items from indexmap and intermediate vec so you don't suffer from this. Maybe in your case it is acceptable but in some others it is absolutely not.

My current version: with 2 vecs and 1 hashmap improved worst cases for queue in 50-60% (3,5 ms from 7) but worsened best cases almost twice (added 160 microseconds to 160) in my benches. I suppose that regression provided by additional level of indirection on O(1) cases.

Now I will try implement version with indexmap and release improved 0.1.3 version with compatible API.

AngelicosPhosphoros commented 4 years ago

@Ten0 Also, if you decide to keep your unsafe hacks, approach from std likes more safier. This uses RAII to ensure that we will not produce UB even on panics.

There my variant for my queue. https://gist.github.com/AngelicosPhosphoros/93ecc4e88771de874b9909ce73730383

AngelicosPhosphoros commented 4 years ago

@Ten0 I implemented queue using IndexMap. The main PRO of this a great simplification: It allows keeping almost same code as was in old version but with indices to ordered map instead of keys. The other great advantage is that IndexMap removes requirement of Clone trait from keys and lower memory usage on that keys. Also it improve timings of any kind of removing items from queue (due to lack of extra hashing on lookup during reassigning indices to keys).

But it has big disadvantage: It is slower on pushes, sometimes up to 30% in comparison with my current optimized variant or 130% in comparison with v0.1.2.

Ten0 commented 4 years ago

Also, if you decide to keep your unsafe hacks, approach from std likes more safier. This uses RAII to ensure that we will not produce UB even on panics.

This indeed feels much safer! I'm a bit disappointed that this Hole structure isn't available in the std or in a library. Btw I tried reverting to a fully safe version, and it has shown a 7% performance decrease on my benchmarks. Also, using the hole version has also shown a decrease (which I find surprising) though lower.

I'm probably going to do some more serious benchmarking before choosing between Hole and safe versions. :)

But it has big disadvantage: It is slower on pushes, sometimes up to 30% in comparison with my current optimized variant or 130% in comparison with v0.1.2.

Hmm... interesting! If you find out why (or have already found out) I would be interested to know what the cause of this is. :)

Entry API

I like the entry API for a lot of uses and this would indeed bring the most flexibility. Would definitely be a plus! That being said I have never needed to do anything conditional other than "add the key if priority is greater, unless it has already been removed", whatever the algorithm I was implementing, so I like the idea of having these shortcuts. But it's very true the code for these shortcuts would be much better if implemented via the Entry API.

AngelicosPhosphoros commented 4 years ago

I released version 0.1.3 with fix of hashmap updates drawback :)

AngelicosPhosphoros commented 4 years ago

@Ten0 Please can you run your benchmarks with my 0.1.3 version?

Ten0 commented 4 years ago

This is the performances I have with current versions:

[src/main.rs:61] run_time = 847.497744ms <- priority_queue
[src/main.rs:61] run_time = 1.385735955s <- 0.1.2
[src/main.rs:61] run_time = 687.825406ms <- 0.1.3
[src/main.rs:61] run_time = 514.099194ms <- My version without `unsafe` (using swap)
[src/main.rs:61] run_time = 331.377943ms <- usize-keyed version without `unsafe`
[src/main.rs:61] run_time = 484.639453ms <- My version with the `Hole` circular move
[src/main.rs:61] run_time = 325.943746ms <- usize-keyed version with the `Hole` circular move
[src/main.rs:61] run_time = 462.673136ms <- My version with the unsafiest `unsafe`
[src/main.rs:61] run_time = 307.042517ms <- usize-keyed version with the unsafiest `unsafe`
AngelicosPhosphoros commented 4 years ago

I currently have very little time to develop good implementation of usize keyed queue :(

I hope the most part of this issue is solved during releases 0.1.3 and 0.2.0 so I will close the issue for now.

AngelicosPhosphoros commented 3 years ago

I added possibility to use custom hashers in 0.3.1. This allows users to use faster hashers if they need.