Closed Ten0 closed 4 years ago
I've forked the lib at Ten0/keyed_priority_queue and implemented these. Here's my first draft of benchmarking on randomly-generated inserts and updates:
Running `target/release/keyed_pq_perf_test`
priority-queue = 1.023745113s
keyed_priority_queue = 1.700685561s
keyed_priority_queue fork, Hash version = 532.572417ms
keyed_priority_queue fork, Vec version = 346.142458ms
priority-queue = 928.615817ms
keyed_priority_queue = 2.093646677s
keyed_priority_queue fork, Hash version = 526.908373ms
Hello @Ten0 ! Thank you for your contribution! Sorry for long lack of response: I had very little time to setup detailed benchmarks. And optimizing without measures can be harmful so I postponed optimizations so far. I recreated this keyed_priority_queue because I failed to find the priority_queue library :D
Now I have set up my benchmarks and I am ready to work. Benchmarking code is available in develop branch now.
So about your enhancement suggestions:
1) Using internal map of usizes to usizes as Vec. It is very good point, however it very complicate the code. I had measured it and found that this optimization provide great boost: 50%. I must implement this and make next release.
2) Using unsafe to decrease writes. I found this not very effective. It improve benchmark results by 1-3% which is not worth introducing unsafe, at least at current stage.
3) unsafe impl Sync & Send
: I wasn't know that it is deduced by compiler. Thank you!
If BinaryHeap could be exposed with the previouly mentionned Vec-inside optimization...
OK, I will keep this in mind during reimplementing but I can't promise anything. Maybe it will be added in version 0.2.0
.
About your fork: It cannot be merged now because it has too big changes in API and uses IndexMap which add O(n) complexity on some current O(log n) operations.
Thank you very much for your contribution!
Hello! :)
I recreated this keyed_priority_queue because I failed to find the priority_queue library :D
Well its much nicer to use, doesn't have thousands of useless unsafe lines, and I would have recreated it from scratch to get my performant version if I didn't find yours ^^
About your comments :
priority_queue
), and increases by another 33% when remapping through index map is not needed because keys are already usizes
.update_fn
complexity. update_fn
(just contains the two Vec
s.About your fork, it cannot be merged now because it has too big changes in the API and uses IndexMap
I was aware my changes were not PR-ready yet, which is why I did not open a PR. (in particular I did not update the documentation). However:
Can you enlighten me on these two points?
Hi, Ten0!
_opt
methods since user can make any comparison with value from returned entry.The other possible thing is making usize indexed version.
My current version: with 2 vecs and 1 hashmap improved worst cases for queue in 50-60% (3,5 ms from 7) but worsened best cases almost twice (added 160 microseconds to 160) in my benches. I suppose that regression provided by additional level of indirection on O(1) cases.
Now I will try implement version with indexmap and release improved 0.1.3 version with compatible API.
@Ten0 Also, if you decide to keep your unsafe hacks, approach from std likes more safier. This uses RAII to ensure that we will not produce UB even on panics.
There my variant for my queue. https://gist.github.com/AngelicosPhosphoros/93ecc4e88771de874b9909ce73730383
@Ten0 I implemented queue using IndexMap.
The main PRO of this a great simplification: It allows keeping almost same code as was in old version but with indices to ordered map instead of keys.
The other great advantage is that IndexMap removes requirement of Clone
trait from keys and lower memory usage on that keys.
Also it improve timings of any kind of removing items from queue (due to lack of extra hashing on lookup during reassigning indices to keys).
But it has big disadvantage: It is slower on pushes, sometimes up to 30% in comparison with my current optimized variant or 130% in comparison with v0.1.2.
Also, if you decide to keep your unsafe hacks, approach from std likes more safier. This uses RAII to ensure that we will not produce UB even on panics.
This indeed feels much safer! I'm a bit disappointed that this Hole
structure isn't available in the std or in a library. Btw I tried reverting to a fully safe version, and it has shown a 7% performance decrease on my benchmarks. Also, using the hole version has also shown a decrease (which I find surprising) though lower.
I'm probably going to do some more serious benchmarking before choosing between Hole
and safe versions. :)
But it has big disadvantage: It is slower on pushes, sometimes up to 30% in comparison with my current optimized variant or 130% in comparison with v0.1.2.
Hmm... interesting! If you find out why (or have already found out) I would be interested to know what the cause of this is. :)
Entry API
I like the entry API for a lot of uses and this would indeed bring the most flexibility. Would definitely be a plus! That being said I have never needed to do anything conditional other than "add the key if priority is greater, unless it has already been removed", whatever the algorithm I was implementing, so I like the idea of having these shortcuts. But it's very true the code for these shortcuts would be much better if implemented via the Entry API.
I released version 0.1.3 with fix of hashmap updates drawback :)
@Ten0 Please can you run your benchmarks with my 0.1.3 version?
This is the performances I have with current versions:
[src/main.rs:61] run_time = 847.497744ms <- priority_queue
[src/main.rs:61] run_time = 1.385735955s <- 0.1.2
[src/main.rs:61] run_time = 687.825406ms <- 0.1.3
[src/main.rs:61] run_time = 514.099194ms <- My version without `unsafe` (using swap)
[src/main.rs:61] run_time = 331.377943ms <- usize-keyed version without `unsafe`
[src/main.rs:61] run_time = 484.639453ms <- My version with the `Hole` circular move
[src/main.rs:61] run_time = 325.943746ms <- usize-keyed version with the `Hole` circular move
[src/main.rs:61] run_time = 462.673136ms <- My version with the unsafiest `unsafe`
[src/main.rs:61] run_time = 307.042517ms <- usize-keyed version with the unsafiest `unsafe`
I currently have very little time to develop good implementation of usize keyed queue :(
I hope the most part of this issue is solved during releases 0.1.3 and 0.2.0 so I will close the issue for now.
I added possibility to use custom hashers in 0.3.1. This allows users to use faster hashers if they need.
In order to avoid doing
log(n)
calls to the hashmap when updating the priorities, thepriority_queue
library has an intermediate vector for mapping indexes of the map to indexes of the heap (here). This allows for a much cheaper O(1) update (Vec
instead ofHashMap
).However, I like the idea of having more stuff public (why didnt
priority_queue
make their iterators public?), using safe code, and splitting the heap implementation from the mapping storage. (Is that why you've decided to recreate it?) But since the performance is better from having an intermediateVec
, it'd still probably be better to have one inside theBinaryHeap
structure.I am currently looking for a library that allows for updatable priorities when all my data is already remapped to indexes, and I don't want to pay the additional cost of the HashMap. (Similarly to what I have implemented in C++ here https://github.com/Ten0/updatable_priority_queue). If
BinaryHeap
could be exposed with the previouly mentionnedVec
-inside optimization andKeyedPriorityQueue
was just a wrapper around it for doing the remapping with a HashMap, you would gain performance when using the HashMap and provide room for custom remappings that may be more performant than the HashMap in some cases.Another trick you can do to divide by two the number of writes during a heapify operation is to not swap, but instead keep the element you're trying to move down on the side, and only move up the elements as you go down, and then finally place your element at the bottom. (Easier to do once all your elements have
Clone
because they areusize
s because you've got internal remapping.)As a side note, I'm very surprised you need to
unsafe impl
Sync
&Send
on your structures, normally the compiler would figure these out itself provided that everything that is contained hasSync
andSend
. If that's indeed the case, you should never impement these yourself unless it's not automatically derived by the compiler, you understand why it's not derived by the compiler, and you have very good reason to know that your implementation guarantees memory safety despite the reasons why the compiler did not derive it automatically.