GlenKPeterson / Paguro

Generic, Null-safe, Immutable Collections and Functional Transformations for the JVM

Other

312 stars 24 forks source link

RRB-Tree (was: Best way to create PersistentVector with multiple elements?) #4

Closed pniederw closed 7 years ago

pniederw commented 8 years ago

What's the preferred way to create a PersistentVector with multiple elements? Would you recommend to first create a list or array, then use concat to benefit from the mutable vector optimization? Would it make sense to have a builder for this? A builder could perhaps perform additional optimizations, such as using a different representation for vectors with a small number of elements.

GlenKPeterson commented 8 years ago

Excellent questions!

"What's the preferred way to create a PersistentVector with multiple elements?"

If you have a fixed number of elements, use vec(a, b, c)

To create a vector from a traditional Java collection and maybe transform them some how (like filter some of them):

xform(collection)
    .filter(x -> x != 3)
    .toImList()

If you are stuck building a vector in a legacy loop, there's a MutableVector inside PersistentVector, but I tried to make it really hard to access. If you think you really need it, please post the code that justifies exposing this potentially confusing feature for performance reasons.

"Would you recommend to first create a list or array, then use concat to benefit from the mutable vector optimization?"

No. Creating lists or arrays will likely be more expensive than using the immutable concat (TODO: prove this one way or the other).

Your unspoken follow-up questions are:

Why not use PersistentVector.MutableVector inside Transformable.toImList()?

This is an oversight. Thanks for pointing it out. I can fix this over the weekend and have a new version for you on Monday by adding the following static method to PersistentVector (to avoid exposing MutableVector):

public static <T> ImList<T> fromXform(Transformable<T> trans) {
    return trans.foldLeft(emptyTransientVector(),
                          MutableVector<T>::append).persistent();
}

And changing the body of Transformable.toImList() to:

        return PersistentVector.fromXform(this);

But I want to run JMH (the Java Microbenchmarking Harness) to prove that it makes a positive difference for vectors of various sizes before I commit the change.

Why create an unnecessary list wrapper inside StaticImports.vec(T... items)?

// INTENDED USE
ImList<Integer> myVec = vec(1,2,3,4,5,6);

// ALSO POSSIBLE: PROTECT AGAINST MUTATION
int[] myInts = { 1,2,3,4,5,6 };
ImList<Integer> myVec = vec(myInts);
myInts[3] = 359;

I didn't want to trust the array we are passed, so I copy it to make sure the situation above causes no mutation inside the vector. Lists are easier to work with. I timed it (without JMH) and it looked like the Arrays.asList() made less than a 10% difference (or no difference) vs iterating the array directly during the copy.

If I remember correctly, the real speed limit is on adding an element, we copy an array of N elements (the tail of the existing vector) into an array of N+1 elements (the new tail), then add the new element to the second array. Java writes a 0 into each element of the second array, then copies elements over every one of those zeros. If there was a way to get Java to provide an uninitialized array, the existing implementation might be 50% faster for it. Maybe when they get rid of Unsafe in Java 9 they'll expose something like this?

This copy-on-write strategy for the tail array is what makes it thread-safe. Other synchronization could cause deadlocks that are potentially more (or less) time consuming, but probably harder to reason about. I don't know. Without proving that I could make it better, I eventually moved on.

Would it make sense to have a builder for this? A builder could perhaps perform additional optimizations, such as using a different representation for vectors with a small number of elements.

I did several experiments along these lines and had real trouble improving on Rich Hickey's design. But your perspective may yield better results. I am interested in hearing more!

I think Google Guava uses builders to create unmodifiable collections. But then if you know you're going to add more, you're left passing a mutable builder around. If you ever have to add just one more element to an unmodifiable collection, it forces an expensive copy. Both situations encourage surprises. The Clojure collections are always immutable with fairly fast appends. They take less thought, less planning, and is always safe to add another element without a huge performance penalty.

If you need the kind of performance you can only get by mutating in place, you can rewrite that piece of code using the traditional Java collections. My experience so far with these transforms and the Clojure collections is that my first pass has fewer bugs and runs faster because I'm not worrying about the shortcomings I just mentioned, so I can focus instead on the Big O complexity of what I'm doing. I generally never need to do that rewrite. In fact, things often go faster when I rewrite traditional Java using these collections just because there is nothing to distract me from the Big O consequences of the transformations.

Still, if all transformations are done functionally, that might mean you never have to expose the builder. With enough forethought, you rarely need to add just one more item later. I need to think more on this.

Anyway, I guess timing the vector against Guava would be another good task. That might be the following weekend though.

Great questions. I hope this helped without getting too far off topic. I'm interested in hearing your follow-ups!

pniederw commented 8 years ago

Thanks for your detailed explanations!

If you are stuck building a vector in a legacy loop, there's a MutableVector inside PersistentVector, but I tried to make it really hard to access. If you think you really need it, please post the code that justifies exposing this potentially confusing feature for performance reasons.

A main reason for my being "stuck" is the need to build a vector from right to left. Would it make sense to cover such needs by adding a foldRight operation?

Why create an unnecessary list wrapper inside StaticImports.vec(T... items)? [...] I didn't want to trust the array we are passed, so I copy it to make sure the situation above causes no mutation inside the vector.

If you are referring to the use of Arrays.asList(...), I don't think it will have any safety benefits, as it does not copy the array.

I did several experiments along these lines and had real trouble improving on Rich Hickey's design. But your perspective may yield better results. I am interested in hearing more!

As I'll be dealing with many small collections, I imagined that having a special representation for them might be beneficial. From what I read, http://openjdk.java.net/jeps/269 is an example of optimizing for small collections. However, the discussion in http://dev.clojure.org/jira/browse/CLJ-1517 and some of the linked documents indicate that specialized representations can not only improve performance, but also seriously harm it!

Java writes a 0 into each element of the second array, then copies elements over every one of those zeros. If there was a way to get Java to provide an uninitialized array, the existing implementation might be 50% faster for it. Maybe when they get rid of Unsafe in Java 9 they'll expose something like this?

Best I could find on this topic (not specific to 9): https://bugs.openjdk.java.net/browse/JDK-8146828

Thanks for your work on UncleJim. It's the most promising Java-friendly persistent collections library I could find. It was reassuring to hear that you are building on and/or following the work of Rich Hickey, Paul Phillips, CHAMP, etc. The only thing that made me feel uncertain about the codebase is the large amount of commented out source code.

I've also considered using https://github.com/usethesource/capsule, but it didn't seem quite ready yet, and is currently limited to unordered maps and sets.

GlenKPeterson commented 8 years ago

need to build a vector from right to left. Would it make sense to cover such needs by adding a foldRight operation?

I'm not sure about adding foldRight, but with or without it, the Clojure vector grows from left to right. All of its optimizations are based on that assumption. Any insert causes everything to the right to be copied to the new vector. An insert at position 0 causes the worst-case scenario where the entire collection is copied!

Solutions:

Map

If you know the index, you can use a Map of indices to values as though it were a Vector. It doesn't have quite the performance characteristics of the Vector, but it's pretty close.

Linked List

You can implement your own immutable linked list because these naturally grow right-to-left (and need to be read back left-to-right). If I were to add this to UncleJim, it would extend UnmodIterable because it has no efficient way to lookup by index or return its size. It would look something like:

class Cons<T> extends Tuple2<T,Cons<T>> implements UnmodIterable<T> {
    Cons(T val, Cons<T> next) { super(val, next); }
    public T val() { return _1; }
    public Cons<T> next() { return _2; }

    // Implemented as an optimization because it's the most efficient operation for this data structure.
    @Override public Cons<T> precat(Iterable<? extends T> list) {
        Cons<T> c = this;
        for (T t : list) {
            c = new Cons<>(t, c);
        }
        return c;
    }

    @Override public UnmodIterator<T> iterator() {
        Cons<T> parent = this;
        return new UnmodIterator<T>() {
            private Cons<T> c = parent;
            @Override public boolean hasNext() {
                return c._2 != null;
            }
            @Override public T next() {
                c = c.next();
                return c._1;
            }
        };
    }
}

I think at one point I had an append() and prepend() on Transformable, but took them off because I never used them and they were very hard to implement efficiently for all collections. If this were a useful class, it might be worth adding prepend() back. I don't know. I can't think about all the pros and cons right now (get it - cons? har har).

One reason I didn't add a linked list is that they don't strike me as being very efficient. They take double the space of what you're storing and unless the JVM is smart enough to recognize what you're doing there is no indication that they should be stored adjacent to each other in memory which causes lots of cache misses when you iterate through them - which is the most efficient way to access them!

Reversed List Wrapper

Build the vector left to right (the reverse of what you want) because this is efficient. Then call .listIterator() and instead of using next() and hasNext(), use previous() and hasPrevious(). It means using loops, but I'm just trying to give you options here. Better yet, formalize this with a ImReversedList wrapper like:

class ImReversedList<T> implements UnmodList<T> {

    private final ImList<T> innerList;
    ImReversedList(ImList<T> list) { innerList = list; }

    @Override public int size() {
        return innerList.size();
    }

    @Override public boolean equals(Object o) {
        if ( (o == null) || !(o instanceof List) ) { return false; }
        List other = (List) o;
        if (other.size() != size()) { return false; }
        return UnmodSortedIterable.equals(this, UnmodSortedIterable.castFromList(other));
    }

    @Override public int hashCode() {
        return innerList.hashCode();
    }

    @Override public T get(int i) {
        return innerList.get(innerList.size() -1 - i);
    }

    public ImReversedList<T> prepend(T t) {
        return new ImReversedList<>(innerList.append(t));
    }

    @Override public UnmodSortedIterator<T> iterator() {
        UnmodListIterator<T> iter = innerList.listIterator(innerList.size() - 1);
        return new UnmodSortedIterator<T>() {
            @Override public boolean hasNext() {
                return iter.hasPrevious();
            }

            @Override
            public T next() {
                return iter.previous();
            }
        };
    }
}

Notice the prepend() method. I would consider something like that, if it weren't for RRB Trees...

RRB Tree

Implement a Bagwell/Rompf RRB tree instead because it has essentially the same performance characteristics as PersistentVector and also supports inserts: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwj92_2aluTMAhVEHR4KHWEaCiYQFggpMAI&url=https%3A%2F%2Finfoscience.epfl.ch%2Frecord%2F169879%2Ffiles%2FRMTrees.pdf&usg=AFQjCNGcuAE3g-18EywBnn2R_Sg7GdQlvw&sig2=ZO1X83Pl7iDFwFzdlXVKNw&bvm=bv.122129774,d.dmo That paper says it's based on a Red Black Tree, but I think they really mean B-tree. Algorithms by Cormen, Leiserson, Rivest, Stein p. 490 could easily be done immutably with the modifications described in that paper. I had it mostly done, but I thought I was being clever and changed something I shouldn't have. I think I could finish it in a weekend, but then I'd have to test it.

If the RRB tree could mostly equal the performance of the PersistentVector, I'd replace it in this project. This has the added benefit of one more piece of Apache code and one less of Eclipse. Also, I think this was Rich Hickey's suggestion for PersistentVector inserts. I like this solution the best.

If you are referring to the use of Arrays.asList(...), I don't think it will have any safety benefits, as it does not copy the array.

I had written another 3 pages of response to your question, but cut them for brevity. Arrays (and varargs) are a mess when it comes to type-safety. I had to eliminate them from the project to finish and have confidence of correctness. Wrapping the 3 varargs in Arrays.asList() allowed me to forget about those issues for only a very small performance penalty. The alternative would be to provide a PersistentVector constructor that takes varargs and that becomes part of the API. I really didn't want that!

As I'll be dealing with many small collections, I imagined that having a special representation for them might be beneficial... specialized representations can not only improve performance, but also seriously harm it!

Performance on the JVM is hard to measure and harder to understand. Clojure had a separate implementation for small Maps which I did not include in UncleJim. It takes a lot of work to "First, do no harm" and verify that every performance optimization is correct and actually makes a difference that's worth the added complexity. I believe that the RRB tree is a proven win in the Clojure community, but it's implemented completely in Clojure. I'd want a total rewrite in Java.

Thank you for all your kind words and for your interest in this project. I skimmed all of those links and plan to go back to each one when I have more time - all very interesting. I appreciate your work. I've eliminated a lot of bugs by incorporating this project at work where it's been running in production for over a year. I spent a lot of time adding unit tests and fussing over the interfaces. I put a lot of trust in it, for what that's worth, but I have no idea how many other people use it.

Yes, Rich Hickey and Paul Philips have amazing ideas. I think it may be time to add Bagwell and Rompf to the list of inspirations... Who is CHAMP?

In terms of commented-out code, that's good feedback. I went back and forth many times on the best way to do some things and left some of the other ways in there. Sometimes I like to leave something in there to say, "I thought of this, but decided against it." I like to know exactly what it was I had thought of because sometimes the tiniest change can make a huge difference. I have also implemented some things and spent a fair amount of time on them, but without testing them didn't want to make them live - like using a Map to allow somewhat efficient inserts into a Vector. So that's the reason for that. I guess I can still see them going back in source control. I tried to eliminate all dead code, but anything commented out wasn't a priority. I'll consider cleaning some of that up.

pniederw commented 8 years ago

Sorry I didn't express myself correctly. I do build the new vector from left to right, but do so while iterating the old vector from right to left. I already have a reverse java.lang.Iterator/Iterable backed by ListIterator for that. What I didn't consider is that using Xform.of() on that (or writing my own ImReversedList as you showed) would allow me to use the regular transformations, avoiding the need to iteratively build a vector in my own code. Thanks for the tip!

If the RRB tree could mostly equal the performance of the PersistentVector, I'd replace it in this project. This has the added benefit of one more piece of Apache code and one less of Eclipse. Also, I think this was Rich Hickey's suggestion for PersistentVector inserts. I like this solution the best.

I think this is the right direction to look. Concatenation and insert in O(logN) would be very nice to have.

Yes, Rich Hickey and Paul Philips have amazing ideas. I think it may be time to add Bagwell and Rompf to the list of inspirations... Who is CHAMP?

CHAMP is an attempt to improve on HAMT: http://michael.steindorfer.name/publications/oopsla15.pdf The paper's author has an implementation here: https://github.com/usethesource/capsule I thought I had seen a reference to CHAMP somewhere in your project (as a thing to consider), but maybe I just mixed up my facts.

GlenKPeterson commented 8 years ago

Xform.of() (and everything in this project) takes an Iterable, not an Iterator. But it sounds like you've got that under control.

I forgot to mention that we could expand RangeOfInt to take a "step" parameter like every other Range class out there. you could give it -1 and a starting value and it would count down instead of up. Then you could write:

RangeOfInt.of(myVec.size() - 1, 0, -1)
    .map(i -> myVec.get(i))
    .whatever...

I have to check the order of parameters of the Range class in Scala and Python and other languages to make sure to match what they all seem to do. I always intended to add a step to Range. I can do that this weekend if you think you'd find it useful, or you can do it if you feel inspired and I'll merge it. I think the unit tests for that are pretty thorough for Range, so it should be easy to expand without breaking. I don't remember the iterator() on the vector as being faster than getting the elements by ID, so there shouldn't be a significant performance penalty for doing this.

The ImReversedList could actually be handy. Well, not as handy as the RRB Tree. So I guess RRB would be a higher priority. If you'd find the ReversedList more useful, I'd be interested in why. I really tried not to have duplicate functionality where it's practical to avoid it.

I look forward to learning about CHAMP and HAMT! I don't understand every aspect of the implementation of the PersistentHashMap, especially not the hash collision part. Nor have I read the papers about it. That's nice that there's an implementation - we can measure the performance against Rich Hickey's implementation before deciding whether to swap that code or not. :-)

Unless you have a reason otherwise, I'm prioritizing:

Add step parameter to Range (this weekend)
Implement RRB Tree (compare performance and consider replacing PersistentVector)
Compare performance of PersistentHashMap against newer implementation

pniederw commented 8 years ago

I don't remember the iterator() on the vector as being faster than getting the elements by ID

Looking at the code, the iterator seems to do less work, by caching the result of leafNodeArrayFor().

Unless you have a reason otherwise, I'm prioritizing: [...]

Just to be open about this, I'm not currently able to make contributions to OSS projects. Out of the things you listed, I'd personally be most interested to see a comparison with RRB trees, but please don't take that as an expectation of any kind. Thanks again for UncleJim!

GlenKPeterson commented 8 years ago

Wow! Using the mutable vector inside toImList() made a roughly 5x improvement across the board from 100 items through 10,000 items. Tests done on an i4790K @4Ghz x4. The numbers for the 100,000 test are suspect because the test only got to run once.

buildImList1_10 the 1_10 or 2_10 means 1=immutable, 2=mutable vector. The second number is the number of items to be xform'd into an ImList.

java -jar target/benchmarks.jar -wi 5 -i 7

MyBenchmark.buildImList1_10      thrpt   70   4379456.468 ±   9603.377  ops/s
MyBenchmark.buildImList2_10      thrpt   70  10764304.049 ± 107803.439  ops/s
10x faster

MyBenchmark.buildImList1_100     thrpt   70    456896.497 ±   6712.671  ops/s
MyBenchmark.buildImList2_100     thrpt   70   2361390.877 ±  14776.280  ops/s
5.1x faster

MyBenchmark.buildImList1_1000    thrpt   70     46062.936 ±    448.272  ops/s
MyBenchmark.buildImList2_1000    thrpt   70    262964.786 ±   4290.914  ops/s
5.7x faster

MyBenchmark.buildImList1_10000   thrpt   70      4655.640 ±     42.787  ops/s
MyBenchmark.buildImList2_10000   thrpt   70     18871.947 ±    279.877  ops/s
4x faster

MyBenchmark.buildImList1_100000  thrpt   70       453.247 ±      1.364  ops/s
MyBenchmark.buildImList2_100000  thrpt   70      1857.973 ±     12.441  ops/s
9x faster

This improvement is in version 1.0.2 which I just deployed to Sonatype / Maven Central. Thank you! I put your name in the Change Log

GlenKPeterson commented 8 years ago

I re-read the paper on RRB-Trees this weekend and it mentioned that the Clojure HashMap uses HAMT, so maybe that's where you heard it recently?

The nodes in an RRB-Tree can be either equivalent to the PersistentVector nodes (indexed by bitwise operations on the given index), or they can contain a list of indexes for each sub-node so that each node can have a different size (indexed using subtraction on the given index). It's like implementing 2 similar data structures and mixing the nodes up based on which performs better for the given operations and structure so far. Pretty cool.

It also uses an optimization of having a constant-time array just like the tail in PersistentVector, but instead of always pointing to the tail of the data structure, it points to wherever in the vector was last inserted. So it could be the head, or the tail, or some place in the middle. I guess it could be an anti-pattern if the insert indicies are random, but they are probably usually at the start or end. If you are spraying inserts all over the data structure, you should maybe be using a Sorted Map instead anyway (more on that below).

I'll review B-Trees in my algorithm book and review what I had built before. Then if it all gels, I'll put it together. It's not the easiest data structure to code, and then it needs to be made as fast as one of the wonders of the modern programming world. I can't give you a specific timeline, but I plan to work toward it and will keep you updated.

I really appreciate your succinct and accurate bug reports this morning. I really hope you have easy success with UncleJim, but so far you've had to do work! :-( I've had 2 more thoughts about your specific use case:

Sorting Clearly order is important, or you wouldn't want to reverse your list. How did that list get in order in the first place? Did you sort it in memory? If so, could you use a SortedSet (or SortedMap) with a Comparator instead of a List?

I never need to reverse a list because I use a SortedSet or SortedMap with a Comparator in those cases. The best sorting algorithm is O(log n) anyway. The performance difference between a Sorted Set/Map and a List is a very small log factor O(logBase32 n) vs. O(logBase2 n) where the constant factor for creation of the logBase2 is smaller than the logBase32. The lookup speed is reversed, but for iteration the difference may not matter much. Stick your data in a SortedSet and read it out once, you really aren't leaving much (any?) performance potential untapped. For this reason, I probably use a SortedSet the most of the 5 primary collections (List, Hash Set/Map, and Sorted Set/Map). You can List and SortedSet using a SortedMap - PCollections more or less does just that!

Xform.ofReversedList(List list) Assuming that you cannot use a SortedSet for some reason, I could implement a reversed Xform pretty easily. Much more easily than building an RRB Tree! I think I typed most of the code into a previous post. Would that fix your issue?

I could theoretically also add: Xform.ofReverseMap(SortedMap map) Xform.ofReverseSet(SortedSet set)

As an aside, I increasingly think Xform.foldLeft() should be renamed to Xform.fold(). FoldLeft goes backwards on linked lists, but forward on other data structures, which is a confusing place to start. If the input data structure is not ordered, then the Left/Right doesn't mean anything anyway. Thoughts?

Thanks for reading!

pniederw commented 8 years ago

Thanks for the quick fixes and release of 1.0.3. Much appreciated.

I use PersistentVector as the underlying data structure for my own List data type/API. My data type offers several methods which require processing in reverse order (e.g. takeLastWhile) . Xform.ofReversedList could help with that.

As I want my List data type to be as widely applicable as possible, RRB-Trees could be an improvement, due to O(log n) insert and concat.

Renaming foldLeft to fold sounds sensible to me.

pniederw commented 8 years ago

By the way, here is some interesting material I found on RRB-trees, in case you haven't found it already:

http://hypirion.com/thesis (Java implementation: https://github.com/hyPiRion/pvec-perf/tree/master/pvec) https://github.com/nicolasstucki/scala-rrb-vector

GlenKPeterson commented 8 years ago

Thanks for all the links, for your interest, and for your encouragement! I loved seeing Phil Bagwell present (may he rest in peace).

I've gotten a start on the RRB-Tree. The full implementation involves using strict radix nodes when possible and relaxed radix nodes when not. I've got only strict radix nodes working (and not for replace()), so it doesn't help you yet. It's just a progress milestone to show that I have worked on it since we spoke: https://github.com/GlenKPeterson/UncleJim/blob/2016-05-22_RRB-Tree/src/main/java/org/organicdesign/fp/experimental/RrbTree1.java

I'm trying to build primarily off the paper to make it arguably my own implementation, not a derivative work of someone else's code, so that I can license it under Apache. But when done, I'll certainly check it against other implementations. The paper was many times easier to read than the one I did look at, so, it works out that way.

Here is some sample output to show that the tree is being built correctly (I'm using a radix of 4 instead of 32 for easy and thorough testing). We want to see all the left nodes packed before adding a new one on the right. When a level has 4 nodes at it, we add a new, higher level (growing the tree from the top ensures that it grows evenly):

[0, 1, 2, 3]

Radix2[[0, 1, 2, 3], [4, 5, 6, 7]]

Radix2[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

Radix2[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]]

Radix4[Radix2[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],
       Radix2[[16, 17, 18, 19]]]

...

Radix4[Radix2[[ 0,  1,  2,  3], [ 4,  5,  6,  7], [ 8,  9, 10, 11], [12, 13, 14, 15]],
       Radix2[[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]],
       Radix2[[32, 33, 34, 35], [36, 37, 38, 39], [40, 41, 42, 43], [44, 45, 46, 47]],
       Radix2[[48, 49, 50, 51], [52, 53, 54, 55], [56, 57, 58, 59], [60, 61, 62, 63]]]

Radix6[Radix4[Radix2[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],
              Radix2[[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]],
              Radix2[[32, 33, 34, 35], [36, 37, 38, 39], [40, 41, 42, 43], [44, 45, 46, 47]],
              Radix2[[48, 49, 50, 51], [52, 53, 54, 55], [56, 57, 58, 59], [60, 61, 62, 63]]],
       Radix4[Radix2[[64, 65, 66, 67]]]]

...

Radix6[Radix4[Radix2[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]],
              Radix2[[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]],
              Radix2[[32, 33, 34, 35], [36, 37, 38, 39], [40, 41, 42, 43], [44, 45, 46, 47]],
              Radix2[[48, 49, 50, 51], [52, 53, 54, 55], [56, 57, 58, 59], [60, 61, 62, 63]]],
       Radix4[Radix2[[64, 65, 66, 67], [68, 69, 70, 71], [72, 73, 74, 75], [76, 77, 78, 79]],
              Radix2[[80, 81, 82, 83], [84, 85, 86, 87], [88, 89, 90, 91], [92, 93, 94, 95]]]]

You may be asking, "Hey, how come it grows in groups of 4 instead of one item at a time?" The "tail" grows one element at a time. When full, the tail gets moved (unchanged) into the tree. Most insertions lengthen the tail only, leaving the tree unchanged. This helps both speed and simplicity. In this implementation, the tail is called the "focus" after the Scala folk's implementation as described in the paper.

I'm mostly out of weekend at this point, so it may be some weeks before the Relaxed nodes are implemented, then some weeks more before I have it optimized and benchmarked against the PersistentVector to start to see trade-offs.

pniederw commented 8 years ago

Thanks for the update! Having a good Java implementation of RRB-Trees will be invaluable.

I assume you've also seen their 2015 paper? https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf

GlenKPeterson commented 8 years ago

inserts at 0 work now and build a balanced tree of Relaxed radix nodes. I know it doensn't show in this example, but it builds as many strict nodes as it can to start, and only when the tree becomes 3 levels deep does it need to switch to Relaxed nodes: https://github.com/GlenKPeterson/UncleJim/blob/2016-05-22_RRB-Tree/src/main/java/org/organicdesign/fp/experimental/RrbTree1.java

Here is inserting the first 21 items at index 0 (using strictRadix=4, minRadix=3, maxRadix=5 for easier testing). Notice how it uses a Strict node as long as it can before switching to a Relaxed one:

RrbTree(fsi=0 focus=[0]
        root=[])
RrbTree(fsi=0 focus=[1, 0]
        root=[])
RrbTree(fsi=0 focus=[2, 1, 0]
        root=[])
RrbTree(fsi=0 focus=[3, 2, 1, 0]
        root=[])
RrbTree(fsi=0 focus=[4]
        root=[3, 2, 1, 0])
RrbTree(fsi=0 focus=[5, 4]
        root=[3, 2, 1, 0])
RrbTree(fsi=0 focus=[6, 5, 4]
        root=[3, 2, 1, 0])
RrbTree(fsi=0 focus=[7, 6, 5, 4]
        root=[3, 2, 1, 0])
RrbTree(fsi=0 focus=[8]
        root=Strict2[[7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[9, 8]
        root=Strict2[[7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[10, 9, 8]
        root=Strict2[[7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[11, 10, 9, 8]
        root=Strict2[[7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[12]
        root=Strict2[[11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[13, 12]
        root=Strict2[[11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[14, 13, 12]
        root=Strict2[[11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[15, 14, 13, 12]
        root=Strict2[[11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[16]
        root=Strict2[[15, 14, 13, 12], [11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[17, 16]
        root=Strict2[[15, 14, 13, 12], [11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[18, 17, 16]
        root=Strict2[[15, 14, 13, 12], [11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[19, 18, 17, 16]
        root=Strict2[[15, 14, 13, 12], [11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])
RrbTree(fsi=0 focus=[20]
        root=Relaxed(endIndicies=[4, 8, 12, 16, 20]
                     nodes=[[19, 18, 17, 16], [15, 14, 13, 12], [11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]]))

Here's a tree with 32 items in it

Relaxed(endIndicies=[20, 32]
        nodes=[Relaxed(endIndicies=[4, 8, 12, 16, 20]
                       nodes=[[31, 30, 29, 28], [27, 26, 25, 24], [23, 22, 21, 20], [19, 18, 17, 16], [15, 14, 13, 12]]),
               Relaxed(endIndicies=[4, 8, 12]
                       nodes=[[11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])])

Here's one with 92:

Relaxed(endIndicies=[56, 92]
        nodes=[Relaxed(endIndicies=[20, 32, 44, 56]
                       nodes=[Relaxed(endIndicies=[4, 8, 12, 16, 20]
                                      nodes=[[91, 90, 89, 88], [87, 86, 85, 84], [83, 82, 81, 80], [79, 78, 77, 76], [75, 74, 73, 72]]),
                              Relaxed(endIndicies=[4, 8, 12]
                                      nodes=[[71, 70, 69, 68], [67, 66, 65, 64], [63, 62, 61, 60]]),
                              Relaxed(endIndicies=[4, 8, 12]
                                      nodes=[[59, 58, 57, 56], [55, 54, 53, 52], [51, 50, 49, 48]]),
                              Relaxed(endIndicies=[4, 8, 12]
                                      nodes=[[47, 46, 45, 44], [43, 42, 41, 40], [39, 38, 37, 36]])]),
               Relaxed(endIndicies=[12, 24, 36]
                       nodes=[Relaxed(endIndicies=[4, 8, 12]
                                      nodes=[[35, 34, 33, 32], [31, 30, 29, 28], [27, 26, 25, 24]]),
                              Relaxed(endIndicies=[4, 8, 12]
                                      nodes=[[23, 22, 21, 20], [19, 18, 17, 16], [15, 14, 13, 12]]),
                              Relaxed(endIndicies=[4, 8, 12]
                                      nodes=[[11, 10, 9, 8], [7, 6, 5, 4], [3, 2, 1, 0]])])])

So that's some progress for you. Remaining items:

Generalize this to insert anywhere (which has some interesting challenges when the size of the insert can vary).
Implement slices (yields a sub-tree)
Implement combining 2 trees (insert one RRB tree into another)
Time & Optimize
If timings are good, replace PersistentVector with RrbTree.

I might leave slices and combining to later. The RRB-Tree does those operations in constant time, so I want to expose those operations, but I think that's less critical for most people's needs than a fast replacement for the Vector that supports inserts at any point (particularly at the index 0).

GlenKPeterson commented 8 years ago

RrbTree(fsi=12 focus=[98, 99]
        root=
Relaxed(endIndicies=[18, 29, 44, 57, 80, 98]
        nodes=[Relaxed(endIndicies=[3, 6, 9, 12, 15, 18]
                       nodes=[[72, 46, 90], [92, 83, 52], [64, 95, 97], [88, 91, 89], [6, 78, 18], [70, 57, 5]]),
               Relaxed(endIndicies=[4, 7, 11]
                       nodes=[[7, 79, 51, 1], [25, 75, 84], [96, 8, 85, 32]]),
               Relaxed(endIndicies=[3, 6, 9, 12, 15]
                       nodes=[[13, 39, 33], [23, 50, 12], [22, 77, 35], [81, 87, 14], [47, 74, 3]]),
               Relaxed(endIndicies=[5, 8, 13]
                       nodes=[[59, 68, 94, 93, 53], [55, 36, 54], [60, 20, 34, 80, 2]]),
               Relaxed(endIndicies=[4, 7, 11, 15, 20, 23]
                       nodes=[[71, 63, 16, 62], [37, 58, 43], [44, 31, 48, 30], [86, 9, 26, 17], [19, 69, 0, 65, 45], [61, 24, 11]]),
               Relaxed(endIndicies=[5, 8, 13, 18]
                       nodes=[[82, 29, 40, 76, 66], [10, 4, 27], [15, 56, 49, 38, 41], [21, 42, 67, 28, 73]])]))

Have random inserts working in the experimental rrb-tree branch. Not sure whether to optimize now or go for slices/combining, but at least there is progress.

pniederw commented 8 years ago

Meanwhile I realized that I want to use UncleJim's data structures, but can't use its higher-level APIs. In particular, I can't use any callback-based APIs. Any chance to support such low-level usage, e.g. by making MutableVector accessible?

GlenKPeterson commented 8 years ago

Please clarify what you mean by, "can't use higher-level/callback-based APIs." Do you mean you can't use the Transformable/xform stuff, or StaticImports, or something else? Is MutableVector all you need?

It's very reasonable to make MutableVector accessible. I didn't originally because I didn't need it and it's always easier to add to an API than take away. The reason I'm writing instead of just doing so is that my first priority for my limited time has been the RRB-Tree which will likely replace both the Persistent and Mutable Vectors. If that also meets your needs, II'd rather push to get the new one released, than stop and make changes to the old one.

I think my second priority (until I better understand your issue) is to make a Java 7 version of this whole project so that Android programmers can use it.

pniederw commented 8 years ago

I can't use the Transformable/xform stuff. If you could design the RRB-Tree API with that in mind, that would be great.

Is a backport to Java 7 really required to make the library available to Android programmers? I don't know much about Android, but have found statements like the following on the web (about the Jack compiler):

Code written with lambda expressions will work on Android versions as far back as Gingerbread, so it should be perfectly suitable for almost any current project.

GlenKPeterson commented 8 years ago

I thought everything worked with the standard Java 8 streams or iterators. No? What's the issue with the xform stuff?

The other issue for Android might be default-methods-on-interfaces which this project makes extensive use of. I don't know. People have asked for it so I'll make an effort.

pniederw commented 8 years ago

There's nothing wrong with the xform stuff. It's just that in my case, I will no longer be able to use lambda/callback-based collection operations because I'm switching to a compiler (Graal) with special requirements. The concrete requirement is that all lambdas/callbacks must be called from my own code (which is adjusted to Graal's needs) rather than third-party code.

GlenKPeterson commented 8 years ago

Split looks like it's working! It should take about the same amount of time as pushing a full focus into the tree the way you would on an insert, which you might call "effectively constant." Here it is with a strict tree (still Radix = 4, with integer 0 at index 0, 1 at 1, etc.):

Strict Tree Split at Index 17 (of 100)

Original

fsi stands for "focus start index" because unlike the PersistentVector's tail, the focus moves around to wherever inserts are being clustered.

RrbTree(size=100 fsi=96 focus=[96 97 98 99]
        root=Strict6(Strict4(Strict2([0 1 2 3] [4 5 6 7] [8 9 10 11] [12 13 14 15])
                             Strict2([16 17 18 19] [20 21 22 23] [24 25 26 27] [28 29 30 31])
                             Strict2([32 33 34 35] [36 37 38 39] [40 41 42 43] [44 45 46 47])
                             Strict2([48 49 50 51] [52 53 54 55] [56 57 58 59] [60 61 62 63]))
                     Strict4(Strict2([64 65 66 67] [68 69 70 71] [72 73 74 75] [76 77 78 79])
                             Strict2([80 81 82 83] [84 85 86 87] [88 89 90 91] [92 93 94 95]))))

Before being split, the focus is pushed into the tree, then the split parts of the leaf node becomes the focus of the resulting trees:

Left

RrbTree(size=17 fsi=16 focus=[16]
        root=Strict2([0 1 2 3] [4 5 6 7] [8 9 10 11] [12 13 14 15]))

Note how this is still a strict tree without any extra ancestor nodes. All ready to be appended to without any performance penalty for having been split! The left-hand side of the split of any Strict tree is still Strict (which should have slightly better lookup performance and should take a little less storage space)

Right

RrbTree(size=83 fsi=0 focus=[17 18 19]
        root=Relaxed(cumulativeSizes=[44 80]
                     nodes=[Relaxed(cumulativeSizes=[12 28 44]
                                    nodes=[Relaxed(cumulativeSizes=[4 8 12]
                                                   nodes=[[20 21 22 23] [24 25 26 27] [28 29 30 31]])
                                           Strict2([32 33 34 35] [36 37 38 39] [40 41 42 43] [44 45 46 47])
                                           Strict2([48 49 50 51] [52 53 54 55] [56 57 58 59] [60 61 62 63])])
                            Strict4(Strict2([64 65 66 67] [68 69 70 71] [72 73 74 75] [76 77 78 79])
                                    Strict2([80 81 82 83] [84 85 86 87] [88 89 90 91] [92 93 94 95])
                                    Strict2([96 97 98 99]))]))

On the right, only the split nodes have become Relaxed. When the resulting tree is inserted/appended to, the focus will be pushed to the front, which would (in 31 out of 32 cases) cause any Strict nodes to become relaxed at that point anyway.

Relaxed Tree Split at Index 71 (of 100)

Here each inserted item is still numbered from 0 to 99, but they are inserted at random indices to yield a slightly uneven Relaxed tree.

Original

RrbTree(size=100 fsi=74 focus=[99]
        root=Relaxed(cumulativeSizes=[22 43 56 69 83 99]
                     nodes=[Relaxed(cumulativeSizes=[4 9 12 15 18 22]
                                    nodes=[[26 68 11 44] [86 70 84 77 69] [63 67 2] [37 73 45] [54 39 14] [82 24 12 27]])
                            Relaxed(cumulativeSizes=[4 9 13 17 21]
                                    nodes=[[64 59 50 51] [79 20 42 90 38] [76 33 22 10] [58 19 36 7] [89 3 56 4]])
                            Relaxed(cumulativeSizes=[3 6 9 13]
                                    nodes=[[74 15 53] [72 93 52] [71 81 18] [98 43 62 9]])
                            Relaxed(cumulativeSizes=[3 6 10 13]
                                    nodes=[[32 49 8] [78 61 31] [48 85 83 60] [34 25 5]])
                            Relaxed(cumulativeSizes=[5 9 14]
                                    nodes=[[92 88 0 47 41] [17 96 57 13] [66 21 28 91 1]])
                            Relaxed(cumulativeSizes=[4 7 11 16]
                                    nodes=[[95 40 6 23] [94 65 87] [46 55 97 35] [29 30 16 80 75]])]))

Left

RrbTree(size=71 fsi=69 focus=[92 88]
        root=Relaxed(cumulativeSizes=[22 43 56 69]
                     nodes=[Relaxed(cumulativeSizes=[4 9 12 15 18 22]
                                    nodes=[[26 68 11 44] [86 70 84 77 69] [63 67 2] [37 73 45] [54 39 14] [82 24 12 27]])
                            Relaxed(cumulativeSizes=[4 9 13 17 21]
                                    nodes=[[64 59 50 51] [79 20 42 90 38] [76 33 22 10] [58 19 36 7] [89 3 56 4]])
                            Relaxed(cumulativeSizes=[3 6 9 13]
                                    nodes=[[74 15 53] [72 93 52] [71 81 18] [98 43 62 9]])
                            Relaxed(cumulativeSizes=[3 6 10 13]
                                    nodes=[[32 49 8] [78 61 31] [48 85 83 60] [34 25 5]])]))

Right

RrbTree(size=29 fsi=0 focus=[0 47 41]
        root=Relaxed(cumulativeSizes=[10 26]
                     nodes=[Relaxed(cumulativeSizes=[5 10]
                                    nodes=[[99 17 96 57 13] [66 21 28 91 1]])
                            Relaxed(cumulativeSizes=[4 7 11 16]
                                    nodes=[[95 40 6 23] [94 65 87] [46 55 97 35] [29 30 16 80 75]])]))

Still to go:

join(RRBTree rrb)
remove(int index)
Change radix from 4 to 32
Speed testing and optimization

This file is Apache licensed, so I still think you can change NODE_LENGTH_POW_2 to 5 and extract and use the working parts now if you want to. I just made that change (temporarily), then added two zeros to the size of the vector in all the tests and the only test that failed is the debug-string which is hard-coded for radix=4, so that's unavoidable. Test coverage is 83%, but most of what's missing are the debugging assertions. Out of 1621 lines, if I exclude assertions and debugging strings, I count only about 20 lines that have partial or no coverage. Some of those were covered when I tested with 32-length arrays. I think it's likely to work for you now for your insert-at-zero if you wanted to try it out. It should even make a strict tree for the first 1024 items when you insert at 0. I understand if you want to wait, but I thought I'd put that out there.

pniederw commented 8 years ago

@GlenKPeterson I'll wait, but thanks for the update!

GlenKPeterson commented 8 years ago

FYI, UncleJim (Paguro) 2.0.5 was just pushed to Maven Central / Sonatype. It includes ImList.reverse() which may help your original issue.

Hopefully the serialization changes are winding down and I'll be able to get back to the RRB Tree before too long.

GlenKPeterson commented 8 years ago

Paguro 2.0.7 exposes the mutable versions (builders) of the 3 persistent collections that have them:

// ImList:
MutableList mutable();

// ImUnsortedMap
MutableUnsortedMap mutable();

// ImUnsortedSet
MutableUnsortedSet mutable();

The mutable interfaces all have an immutable() method to send them back.

GlenKPeterson commented 8 years ago

Paguro 2.0.13 makes those mutable inner classes Public (not the constructors, but the classes themselves) and provides static emptyMutable() factory methods. These changes work around some limitations in the Java 8 type inference algorithm so that you don't have to declare types explicitly in a few circumstances where you used to have to help Java 8 out with some extra variables, type annotations, or casts.

pniederw commented 8 years ago

Updated successfully. FWIW, the Paguro codebase still uses empty().mutable() in a couple of places.

GlenKPeterson commented 7 years ago

The mutable collections have been promoted to StaticImports. I haven't actually tried them out yet beyond achieving 100% test coverage, but they aren't complicated, so... Thanks for encouraging these. I've been using them a lot more than I expected to. https://github.com/GlenKPeterson/Paguro/commit/62218d6919107e32e31c124b98146111144b72cc#diff-13250d740064fd01bb6126fb9d949dd4L103

P.S. Note maven artifactId is now Paguro instead of UncleJim.

GlenKPeterson commented 7 years ago

The RRB-Tree join() is mostly working (tested up to 10,000 elements). I can see that split yields uneven trees in some circumstances and Join doesn't know how to deal with that. I think I'll fix split to keep the perfectly-even-tree invariant. without() is implemented as just a combination of split() and join() so that should work when split is fixed. I think perfectly-even is an invariant we want to preserve, but unlike PersistentVector (which stores the height on the vector container, not on the individual nodes), I think it could be made to work either way. Also, I temporarily overrode ImList.concat(List) because the RRB Tree doesn't have a mutable version (yet?). Not sure how critical that will be.

Still to go:

Debug without(int index) with index = 68 on 76 elements
General clean-up
Change radix from 4 to 32
Consider a mutable version
Speed testing and optimization
Other major changes for a 3.0 release

GlenKPeterson commented 7 years ago

Fixed all known issues with the RRB-Tree. Have all tests passing with radix=32 and with radix = 4. I wrote a function that validates the invariants of the tree which I called from all over the place as well as in the unit tests. I found multiple issues with this technique and fixed them all. I wish I'd thought to do that when I started the project instead of at the end!

split(), join() and without() all work and are O(log n). So I believe the RRB Tree could now be called "complete, correct, and working." :tada:

That said, some functions should probably be moved to another class (like the copy-on-write-Array functions and pretty-printing functions at the end of the file).

I switched to 32-item nodes and tried a quick performance test vs. PersistentVector, for 10-1M items by factors of 10, 3 warmups, 5 iterations, 2 threads. Results: Rrb is 30% slower to build at scale (did not use the mutable builder on the PersistentVector), and slower to iterate at small sizes, but faster at large sizes. Not sure what happened at 100k items, but maybe the way I walk the tree of the Rrb might have a little advantage there. So I think performance of the two data structures is ballpark similar, as expected:

Benchmark                       Mode  Cnt          Score        Error  Units
MyBenchmark.BuildRrb10         thrpt   10    5359317.457 ±  58429.240  ops/s
MyBenchmark.BuildRrb100        thrpt   10     488273.011 ±   7154.882  ops/s
MyBenchmark.BuildRrb1000       thrpt   10      42369.608 ±   1737.401  ops/s
MyBenchmark.BuildRrb10000      thrpt   10       3711.741 ±     47.830  ops/s
MyBenchmark.BuildRrb100000     thrpt   10        333.500 ±      2.157  ops/s
MyBenchmark.BuildRrb1000000    thrpt   10         31.841 ±      0.258  ops/s
MyBenchmark.BuildVec10         thrpt   10    5385747.159 ±  33301.225  ops/s
MyBenchmark.BuildVec100        thrpt   10     508121.782 ±   9926.507  ops/s
MyBenchmark.BuildVec1000       thrpt   10      48240.220 ±    977.641  ops/s
MyBenchmark.BuildVec10000      thrpt   10       4717.994 ±    108.765  ops/s
MyBenchmark.BuildVec100000     thrpt   10        453.918 ±      8.714  ops/s
MyBenchmark.BuildVec1000000    thrpt   10         43.636 ±      1.522  ops/s
MyBenchmark.IterateRrb10       thrpt   10   76411709.267 ± 561606.257  ops/s
MyBenchmark.IterateRrb100      thrpt   10    3346640.993 ±  11275.641  ops/s
MyBenchmark.IterateRrb1000     thrpt   10     485806.810 ±   1164.792  ops/s
MyBenchmark.IterateRrb10000    thrpt   10      49178.258 ±    561.418  ops/s
MyBenchmark.IterateRrb100000   thrpt   10       4965.137 ±     18.456  ops/s
MyBenchmark.IterateRrb1000000  thrpt   10        274.683 ±     10.803  ops/s
MyBenchmark.IterateVec10       thrpt   10  106755912.222 ± 198346.270  ops/s
MyBenchmark.IterateVec100      thrpt   10    6803056.306 ± 156037.273  ops/s
MyBenchmark.IterateVec1000     thrpt   10     816722.174 ±   1767.220  ops/s
MyBenchmark.IterateVec10000    thrpt   10      44577.719 ±   1034.192  ops/s
MyBenchmark.IterateVec100000   thrpt   10       1688.504 ±    206.700  ops/s
MyBenchmark.IterateVec1000000  thrpt   10        210.334 ±      2.310  ops/s

The test is checked in here: https://github.com/GlenKPeterson/Paguro/blob/2016-05-22_RRB-Tree/paguro-bench/pom.xml#L32

Iteration Update

Made some changes to the iterator and measured more carefully (-f 2 -i 5 -wi 8 vs. -wi 5 above).

Benchmark                       Mode  Cnt          Score        Error  Units
MyBenchmark.IterateRrb10       thrpt   10  105396587.541 ± 531012.422  ops/s
MyBenchmark.IterateRrb100      thrpt   10    3781984.444 ±   5067.522  ops/s
MyBenchmark.IterateRrb1000     thrpt   10     604632.536 ±   1725.977  ops/s
MyBenchmark.IterateRrb10000    thrpt   10      59499.575 ±   5696.428  ops/s
MyBenchmark.IterateRrb100000   thrpt   10       5582.703 ±    317.748  ops/s
MyBenchmark.IterateRrb1000000  thrpt   10        246.100 ±      4.562  ops/s
MyBenchmark.IterateVec10       thrpt   10  106844685.606 ± 335969.106  ops/s
MyBenchmark.IterateVec100      thrpt   10    6716592.875 ±   1610.086  ops/s
MyBenchmark.IterateVec1000     thrpt   10     815554.457 ±   1314.055  ops/s
MyBenchmark.IterateVec10000    thrpt   10      44433.149 ±   1009.075  ops/s
MyBenchmark.IterateVec100000   thrpt   10       1799.142 ±      9.862  ops/s
MyBenchmark.IterateVec1000000  thrpt   10        207.657 ±      6.495  ops/s

Comparison (RRB vs. Vector):
     10: RRB is 99% of the speed of the PersistentVector
    100: 50%
   1000: 70%
  10000: 140%
 100000: 300%
1000000: 115%

So RRB is 1/2 as fast at 100 items, but 3x as fast at 100K items and similar otherwise. "Comparable Iteration Performance" seems a fair summary.

Still to go:

Consider a mutable "builder" version
Other major changes for a 3.0 release

GlenKPeterson commented 7 years ago

I finished the RRB Tree and am measuring performance - see attached. Graphs have a logarithmic Y-scale. there are some notes on lines 41 and 42 of the sheet. Tests were done with JMH (Java Microbenchmarking Harness) java -jar target/benchmarks.jar -f 2 -i 5 -wi 8

screenshot from 2017-05-20 21-57-39

In general, the RRB Tree is about the same speed as the Clojure PersistentVector (called "Vec") in these tests. It beats Java.util.ArrayList (called "List") for insert at zero for 1K or more items (Clojure Vector does not have this operation).

I also commented out the "Strict" nodes and ran the tests again. get() is definitely slower on "Relaxed" nodes. I double-checked relaxed at 100K and it seemed to be the correct reading. Not sure why that is faster though.

Anyway, I think I can say "similar performance to Clojure vector" and "Allows efficient inserts at zero" I'm assuming efficient random inserts as well. There's probably something that can be cleaned up and made to go faster.

Code is here: https://github.com/GlenKPeterson/Paguro/blob/2016-05-22_RRB-Tree/src/main/java/org/organicdesign/fp/collections/RrbTree.java

Tests are here: https://github.com/GlenKPeterson/Paguro/tree/2016-05-22_RRB-Tree/paguro-bench

If you do a pull, make sure to get the 2016-05-22_RRB-Tree branch. I guess I finished in under a year, but only by 2 days!

GlenKPeterson commented 7 years ago

I"m closing this ticket as I'm using the 3.0 branch with the RRB Tree in it already. It will be released once all the kinks are worked out, but probably in a month?

GlenKPeterson commented 7 years ago

OK, so 3 1/2 months instead of 1, but 3.0 is released with the RRB-Tree in the master branch of this project. I've uploaded it to sonatype / maven-central and there are JavaDocs now too. Thanks for everyone's thoughts and encouragement!