Gankra / collect-rs

Miscellaneous Collections
Apache License 2.0
64 stars 21 forks source link

Reorganize `tree::map` module #120

Closed apasel422 closed 9 years ago

apasel422 commented 9 years ago

A lot of the redundant impls could be removed if we used type aliases instead of wrapper types.

Benches:

before:
tree::map::bench::iter_1000          :     10601 ns/iter (+/- 599)
tree::map::bench::iter_100000        :   1825796 ns/iter (+/- 271541)
tree::map::bench::iter_20            :       224 ns/iter (+/- 10)

tree::map::bench::lower_bound_1000   :      5261 ns/iter (+/- 206)
tree::map::bench::lower_bound_100000 :   1090768 ns/iter (+/- 268915)
tree::map::bench::lower_bound_20     :       133 ns/iter (+/- 6)

after:
tree::map::bench::iter_1000          :      2823 ns/iter (+/- 186)
tree::map::bench::iter_100000        :   1039052 ns/iter (+/- 243046)
tree::map::bench::iter_20            :        67 ns/iter (+/- 3)

tree::map::bench::lower_bound_1000   :      1674 ns/iter (+/- 86)
tree::map::bench::lower_bound_100000 :    378951 ns/iter (+/- 94456)
tree::map::bench::lower_bound_20     :        52 ns/iter (+/- 4)
apasel422 commented 9 years ago

The doc comments for the new structs also need to be updated.

Gankra commented 9 years ago

Still reading, might find answers eventually, but just asking this now:

Do we still need the warts of RevIter and lower/upper_bound now that we have this nice generic tooling?

Have you benched other search time changes after this refactoring?

apasel422 commented 9 years ago

Sort of. Ultimately, it would be nice to do this, per your Get trait proposal:

let map: TreeMap<u32, char> = ...
let value: Option<&char> = map.get(&1);
let iter_from: SomeIter = map.get(1..);
let iter_to: SomeIter = map.get(..1);
let iter_range: SomeIter = map.get(1..3);
let iter_all: Iter = map.get(..);

with mutable variants. But that won't provide distinction between open and closed bounds (the {lower,upper}_bound methods should be renamed anyway) until we get such a range type in std, or the RevIter issue, because it is not trivial to implement DoubleEndedIterator, at least not without parent pointers. I'm not familiar with existing algorithms for it, but the naive (only?) approach would be to maintain two stacks of nodes.

As for benching, I actually believe that the benches for searching are currently useless due to the lack of black_box. I'll investigate and post some numbers.

Gankra commented 9 years ago

Note that returning from the bench closure feeds that value into a black_box (dunno what the benches look like today).

Another proposal I've been rattling around in my head has been:

fn pred(&self, Bound<&K>) -> Option<&V>
fn succ(&self, Bound<&K>) -> Option<&V>
(plus mutable variants)

where Inclusive and Exclusive are as expected, and pred(Unbounded) == max and succ(Unbounded) == min. A bit of a hack to reduce the combinatoric explosion in the api of wanting to support find_(pred|succ)_(inclusive|exclusive)_(mut?) + (min|max)_(mut?). Although in reality the combinatorics are there, they've just been squished a bit to passing in a Bound instead of calling a particular method. Maybe that makes things a bit too messy, though (and the min/max hack is maybe more cute than useful).

In theory you want these methods in addition to the iterator-y ones because the iterators need to do more work storing search paths and whatnot.

apasel422 commented 9 years ago

Yeah, unfortunately the current benches have a trailing semi-colon, so I think they're feeding ().

As for your proposal, that's certainly a possibility. I will explore a bit more. I think we can leverage traits and type-level bounds (struct Closed<T>(T); struct Open<T>(T);) to reduce the surface area.

Another combinatoric explosion is waiting to happen via the Entry API, which hasn't been implemented for TreeMap yet (but which is on my to-do list).

apasel422 commented 9 years ago

For the benches, it would be great if you could bring in what you did for rust-lang/rust#22151.

apasel422 commented 9 years ago

I have an alternative trait-based design where you can do:

let map: TreeMap<u32, &str> = ...;
map.get(Min);
map.get(Lt(&1));
map.get(Le(&1));
map.get(&1);
map.get(Ge(&1));
map.get(Gt(&1));
map.get(Max);

with mutable variants. The struct names can be changed to better indicate what they do. I have to investigate how feasible this is for the iterator and entry methods (i.e. without code duplication), but it would certainly reduce the number of methods.

Gankra commented 9 years ago

Oh that's a really nice design!

apasel422 commented 9 years ago

I'm still working on the design, so I'd hold off on continuing the review for now.

apasel422 commented 9 years ago

Benches before 94ee0ae:

tree::map::bench::find_rand_100    ...  7 ns/iter (+/- 1)
tree::map::bench::find_rand_10_000 ... 76 ns/iter (+/- 14)
tree::map::bench::find_seq_100     ...  7 ns/iter (+/- 4)
tree::map::bench::find_seq_10_000  ... 39 ns/iter (+/- 3)

Benches after 94ee0ae:

tree::map::bench::find_rand_100    ...  7 ns/iter (+/- 1)
tree::map::bench::find_rand_10_000 ... 74 ns/iter (+/- 6)
tree::map::bench::find_seq_100     ...  7 ns/iter (+/- 1)
tree::map::bench::find_seq_10_000  ... 40 ns/iter (+/- 4)
apasel422 commented 9 years ago

One downside of this is that it requires importing Max/Min, and later it will require importing the predecessor/successor queries. Inherent methods do have an advantage there.

Gankra commented 9 years ago

Yeah. Also we're not really getting rid of combinatorics, just adding a weird calling convention. Internally all the combinatorics are still there, right?

apasel422 commented 9 years ago

Yes, though this is moving toward less redundancy in the implementation too. In order to reduce it further, I'd like to be able to just transmute from immutable references to mutable ones when we know it's safe to do so. My goal with the design, though it'll show more once the Entry and removal operations are supported, is to be able to create an Entry (or perform removal) without the queries themselves knowing how to do so. (A wrapper function builds a stack of nodes as the query itself progresses down the tree.) This is very simple to achieve with equality and min/max, but harder to do automatically with predecessor/successor.

Externally, we're either going to have:

map.get(&1 / Succ(&1) / Max);
map.get_mut(&1 / Succ(&1) / Max);
map.remove(&1 / Succ(&1) / Max);

or

map.get(&1);
map.get_mut(&1);
map.remove(&1);
map.succ(&1);
map.succ_mut(&1);
map.remove_succ(&1);
map.max();
map.max_mut();
map.remove_max();

To say nothing of the Entry API, Pred, inclusive versions of Pred and Succ (which could be represented as separate types, separate methods, or a bool parameter), Min, iterators (which come in reverse and mutable varieties), or the methods that return both the key and value (the latter of which can be mutable).

Of course, {get, get_mut, remove} could all be replaced with just the Entry API (using either of the above conventions), but that would be unergonomic and inefficient.

Java's NavigableMap interface is a good example of the combinatoric issue.

apasel422 commented 9 years ago

Closing in favor of a more principled future approach.