Closed bsteinb closed 8 years ago
Just a random programmer from the interwebs, so take my 0.02 with as much salt as you want.
The Head api sounds to me kind of like the Entry api, and seems good to me.
Ok, now for my crazy thoughts. Add the ability for the heap to be in a broken state. If the heap is in correct form, then pop returns the max item and leaves a hole in the heap. If the heap has a hole, then pop fixes the hole and then returns the max item and leaves a hole in the heap. If the heap is in correct form, then push dose the usual. If the heap has a hole, then push replaces the hole with the new item and fixes the invariant.
I think this naturally optimizes pop then push without new syntax. Is this crazy?
I think it could work with a kind of Entry-like API that mutably borrows the heap, mediating all accesses to it while in an inconsistent state. However, this would hinge on drop()
being called which is not guaranteed, so we'd need to poison the heap (e.g. setting size=0
or better some invalid value) while in an invalid state, and restore it on return.
We've added this to the standard library's BinaryHeap: https://doc.rust-lang.org/beta/std/collections/struct.BinaryHeap.html#method.peek_mut. It's up for stabilization in 1.12.
@sfackler apparently this does neither poison nor leak-amplify.
It doesn't need to. The worst that happens if the guard is leaked is that the heap is improperly ordered, which can already happen if an Ord
method panics. Things accessing the heap after that happens may see improperly ordered values, but there are no safety issues.
That is true. However, it means the operation does not fail fast, but leaves the heap in an inconsistent state, which means subsequent operations can silently return wrong values. This is a quite unfortunate error state, if you ask me. Poisoning – perhaps at least in debug mode – would be a preferable solution if you ask me.
I'm not aware of a context in which someone would end up accidentally leaking the guard, so it doesn't really seem like something worth investing all that much effort in. I don't believe any other guards poison themselves after a leak (e.g. Vec::drain
). They in general do the easiest thing that avoids memory corruption etc.
Hello container enthusiasts.
I have recently been working on a K-Way merge for the
itertools
crate which internally usesBinaryHeap
fromstd
(https://github.com/bluss/rust-itertools/pull/97). UsingBinaryHeap
's interface as it is, the merge algorithm goes through a lot ofpop() -> modify -> optional push()
sequences. This is similar to thepop() -> modify -> push()
sequence that adecrease-key
/increase-key
operation can be mapped to. Both sequences can be optimised by allowing (safe) access to the first element of the heap for inspection/modification/removal and only restoring the heap invariant once the operation is complete (unlikepop() -> modify -> push()
which has to sift twice).Inspired by @frankmcsherry's optimised implementation of the algorithm using an ad-hoc heap, I have set out to capture the essence of the optimised access pattern in an interface that does not require access to the internals of the heap data structure. I have added them to a clone of the
binary_heap
module.The experiment currently contains two sets of interfaces. One that uses closures for the
modify
part (or themodify -> optional push()
part) and is similar to thepoke()
interface proposed by @llogiq. I have named the functionspop_push()
andpop_and_then_push()
which offer two variants of the optimisedpop() -> modify -> optional push()
sequence andpeek_mut()
for the optimisedpop() -> modify -> push()
sequence by way of a mutable reference.The other interface should cover the same operations but is based on wrapper types that provide access to the greatest element and restore the heap invariant in
drop()
(similar toVec
sDrain
wrapper). The wrapper types areHead
which allowspop() -> modify -> optional push()
andHeadRef
forpop() -> modify -> push()
.While the closure based interface is safer (no wrappers that might be leaked), the wrapper interface feels more rustic (just like
Iterators
andDrain
, I guess). The implementations make use ofunsafe
code for a performance improvement of roughly 2 x in my experiments based on the K-way merge which operate on small heaps.At the moment, neither interface has the ability to modify any element but the greatest as was discussed in https://github.com/contain-rs/discuss/issues/11.
I now turn to you for feedback on my design. Does the interface seem sensible? Is this something that might be considered for inclusion in
std
? I know it significantly adds to the surface ofBinaryHeap
s interface for just an optimisation, but on the other hand it could be used to either re-implement some of the methods (e.g.pop()
andreplace()
) or altogether subsume them.