Kotlin / kotlinx.collections.immutable

Immutable persistent collections for Kotlin
Apache License 2.0
1.19k stars 63 forks source link

CHAMP algorithm is current state of the art for immutable/persistent collections #6

Closed mikehearn closed 5 years ago

mikehearn commented 8 years ago

I have nothing against using the PCollections library but the CHAMP algorithm is the current state of the art for JVM based immutable collection implementations:

https://blog.acolyer.org/2015/11/27/hamt/

You can find an implementation here:

https://github.com/usethesource/capsule

The footprint improvements are quite impressive. It'd be a shame if Kotlin shipped a library that wasn't using the current best research into the field, especially as an implementation already exists. It'd be a nice competitive advantage over other languages.

hastebrot commented 8 years ago

This would be very great.

Just for reference: A while ago I found this older implementation of a HAMT in Kotlin: https://github.com/KenBarclay/KData I'm not quite sure if this uses the CHAMP algorithm.

On Jul 21, 2016 12:36 PM, "Mike Hearn" notifications@github.com wrote:

I have nothing against using the PCollections library but the CHAMP algorithm is the current state of the art for JVM based immutable collection implementations:

https://blog.acolyer.org/2015/11/27/hamt/

You can find an implementation here:

https://github.com/usethesource/capsule

The footprint improvements are quite impressive. It'd be a shame if Kotlin shipped a library that wasn't using the current best research into the field, especially as an implementation already exists. It'd be a nice competitive advantage over other languages.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Kotlin/kotlinx.collections.immutable/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeSf6vaCm6aJxeP6QnTF94Eui-8dVBNks5qX0uvgaJpZM4JRqbW .

ilya-g commented 8 years ago

@mikehearn as far as I understand, CHAMP is a data structure suitable for implementing hash maps and sets.

Do you know, is it possible to use CHAMP for ordered sets and maps (those that preserve element insertion order) and indexed lists?

pniederw commented 8 years ago

Last time I checked PCollections, it seemed to be considerably less state-of-the-art than Clojure's and Scala's persistent collections.

@ilya-g The author of capsule answered my questions on the scope of the project here: https://github.com/usethesource/capsule/issues/2

At this time capsule feels more like a research prototype to me, and the project hasn't had much activity lately. However the author will deliver a talk at the upcoming JVMLS.

The most state-of-the-art persistent collections library implemented in Java and (supposedly) production-ready that I could find is https://github.com/GlenKPeterson/UncleJim. The starting point for this library was Clojure's persistent collections implementation. The author is also working on a new and improved persistent vector: https://github.com/GlenKPeterson/UncleJim/tree/2016-05-22_RRB-Tree The author's own comparison to PCollections is here: https://github.com/GlenKPeterson/UncleJim/wiki/UncleJim-vs.-PCollections

mikehearn commented 8 years ago

I don't know the answer to that, sorry.

msteindorfer commented 8 years ago

As the author of the CHAMP paper and the usethesource/capsule library, I want to jump in an answer some questions you're having.

@ilya-g, yes CHAMP can be used to implement hash-sets or hash-maps that preserve element insertion order. I implemented such a data structure last year for another project and now started to back port that data structure to usethesource/capsule. You can expect the implementation to appear in the codebase by September.

@ilya-g, @mikehearn, indeed, CHAMP is suitable for implementing hashed data structures. For implementing immutable indexed lists, the best candidate is currently the RRB Vector. Sorted data structures (e.g., that take a comparator) are probably still implemented best by a balanced binary tree (e.g., such as a red-black tree).

@pniederw et al., usethesource/capsule is more than a research prototype. It powers the builtin data structures of the http://www.rascal-mpl.org programming language. But were you're right is that it still needs polishing and work to make it more appealing to wider audience. From September on, I'll have more time for engineering than for research, therefore you can expected a lot of activity on the project.

Overall the aim of usethesource/capsule is to be a vehicle for implementing cutting edge research results on immutable data structures, and the current focus lies on hashed collections. I'm happy to discuss my roadmap with you and to see how common interests can be aligned.

GlenKPeterson commented 8 years ago

I really think the Clojure collections should be considered. My understanding is that Scala collections grew out of Rich Hickey's implementations of Okasaki's Purely Functional Data Structures and Bagwell/Rompf's papers. Hickey had many critical insights and brought Okasaki's ideas into the mainstream (at least for the JVM world) by adding some of his own and providing a highly performant, working implementation.

We've been using Paguro's type-safe versions of the Clojure collections in production for about 2 years now. When we find bugs, it's in the code that doesn't use these collections. I've been using Paguro in Kotlin as-is for about 3 or 4 months now. We could add some NotNull annotations, but that's about all I see necessary for ideal Kotlin use right now.

Even if you aren't interested in Paguro, I think the original Clojure collections deserve mention on the front page of this project. Without them, there might not be the Scala ones. And Clojure's transformation api is beautifully simple.

norswap commented 8 years ago

I'm adding my vote for CHAMP.

I implemented a immutable map in Kotlin using the approach: https://github.com/norswap/triemap

It's a learning project: It doesn't have the niceties that capsule nor compatibility with the Java collections API. But it should give you a sense of what's involved. Once you've grasped the basics, the implementation isn't terribly hard.

danieldietrich commented 7 years ago

Hi, Daniel chiming in :)

Disclaimer: I'm the creator of Javaslang

The ideas presented here are very interesting.

I'm not sure if the best way to provide interoperability with standard Java collections is to implement java.util.* collection interfaces. The mutator methods (i.e. the ones that return void) can only throw at runtime - which is unsafe. Javaslang's persistent collections for example will provide views for that purpose.

Javaslang was created to bring a big portion of the persistent/immutable part of Scala to Java (8). We benchmark our collections against Scala, Clojure, PCollection, Functional Java, Eclipse Collections and Java mutable collections. Here is how our Vector performs for example (> 1 \:= faster than).

We based our Hash collections on Hash Array Mapped Tree (HAMT), the Sorted collections on a Red/Back-Tree and the Vector on a Bit Mapped Trie (BMT). This is the collection hierarchy so far (upcoming 2.1.0):

javaslang-2-1-0-alpha

Here is a comparison of the APIs of two persistent Vector implementations.

Left: Javaslang / Right: PCollections

javaslang-vs-pcollections

l0rinc commented 7 years ago

Here is how our Vector performs for example (> 1 := faster than).

To make it clear what I meant by PCollections being slower:

Operation   Ratio                                                 10      100     1026 
Create      slang_persistent/pcollections_persistent          29.70×  114.82×  243.27×
Head        slang_persistent/pcollections_persistent           1.75×    2.43×    2.81×
Tail        slang_persistent/pcollections_persistent           4.38×    6.00×    7.59×
Get         slang_persistent/pcollections_persistent           4.86×    6.08×    8.99×
Update      slang_persistent/pcollections_persistent           1.73×    2.34×    3.07×
Prepend     slang_persistent/pcollections_persistent           0.32×    0.74×    1.31×
Append      slang_persistent/pcollections_persistent           0.51×    0.74×    1.09×
Slice       slang_persistent/pcollections_persistent           0.53×    0.38×    0.40×
Iterate     slang_persistent/pcollections_persistent          13.31×   10.68×    8.02×

i.e. except for slice (which has a memory leak in Java, Clojure and PCollections, i.e. a wrapper over all elements), Javaslang is several times faster, consistently.

I think Kotlin is a very nice new language, we would be glad to incorporate techniques from CHAMP into Javaslang (where applicable), if that would help its Kotlin integration :)

zvozin commented 7 years ago

@danieldietrich I, for one, have been using Javaslang in Kotlin a while now as is and loving it. A wrapper that turns static utility methods like

Try<Seq<T>> sequence(Iterable<? extends Try<? extends T>> values)

into extensions is all that I would wish for, and would in fact volunteer to come up with such a package myself - I have half of it spread across my projects already anyway :).

l0rinc commented 7 years ago

@zvozin, I think that would be a very valuable contribution! :)

danieldietrich commented 7 years ago

@zvozin That sounds really great! I will setup an empty Github repository javaslang-kotlin and invite you (with write privileges).

Shusek commented 6 years ago

Any update for it? It can be awesome feature if kotlin collection do not require copy all collection when 'add' operator is called in list.

ilya-g commented 6 years ago

@Shusek There's currently work in progress in the PR #20 with new implementations of persistent collections. The PersistentHashMap implementation is based on CHAMP, iirc.

Feel free to join the review if you're interested.

ilya-g commented 5 years ago

PersistentHashMap implementation in 0.2 uses CHAMP data structure.