Closed mikehearn closed 5 years ago
This would be very great.
Just for reference: A while ago I found this older implementation of a HAMT in Kotlin: https://github.com/KenBarclay/KData I'm not quite sure if this uses the CHAMP algorithm.
On Jul 21, 2016 12:36 PM, "Mike Hearn" notifications@github.com wrote:
I have nothing against using the PCollections library but the CHAMP algorithm is the current state of the art for JVM based immutable collection implementations:
https://blog.acolyer.org/2015/11/27/hamt/
You can find an implementation here:
https://github.com/usethesource/capsule
The footprint improvements are quite impressive. It'd be a shame if Kotlin shipped a library that wasn't using the current best research into the field, especially as an implementation already exists. It'd be a nice competitive advantage over other languages.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Kotlin/kotlinx.collections.immutable/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeSf6vaCm6aJxeP6QnTF94Eui-8dVBNks5qX0uvgaJpZM4JRqbW .
@mikehearn as far as I understand, CHAMP is a data structure suitable for implementing hash maps and sets.
Do you know, is it possible to use CHAMP for ordered sets and maps (those that preserve element insertion order) and indexed lists?
Last time I checked PCollections, it seemed to be considerably less state-of-the-art than Clojure's and Scala's persistent collections.
@ilya-g The author of capsule answered my questions on the scope of the project here: https://github.com/usethesource/capsule/issues/2
At this time capsule feels more like a research prototype to me, and the project hasn't had much activity lately. However the author will deliver a talk at the upcoming JVMLS.
The most state-of-the-art persistent collections library implemented in Java and (supposedly) production-ready that I could find is https://github.com/GlenKPeterson/UncleJim. The starting point for this library was Clojure's persistent collections implementation. The author is also working on a new and improved persistent vector: https://github.com/GlenKPeterson/UncleJim/tree/2016-05-22_RRB-Tree The author's own comparison to PCollections is here: https://github.com/GlenKPeterson/UncleJim/wiki/UncleJim-vs.-PCollections
I don't know the answer to that, sorry.
As the author of the CHAMP paper and the usethesource/capsule library, I want to jump in an answer some questions you're having.
@ilya-g, yes CHAMP can be used to implement hash-sets or hash-maps that preserve element insertion order. I implemented such a data structure last year for another project and now started to back port that data structure to usethesource/capsule. You can expect the implementation to appear in the codebase by September.
@ilya-g, @mikehearn, indeed, CHAMP is suitable for implementing hashed data structures. For implementing immutable indexed lists, the best candidate is currently the RRB Vector. Sorted data structures (e.g., that take a comparator) are probably still implemented best by a balanced binary tree (e.g., such as a red-black tree).
@pniederw et al., usethesource/capsule is more than a research prototype. It powers the builtin data structures of the http://www.rascal-mpl.org programming language. But were you're right is that it still needs polishing and work to make it more appealing to wider audience. From September on, I'll have more time for engineering than for research, therefore you can expected a lot of activity on the project.
Overall the aim of usethesource/capsule is to be a vehicle for implementing cutting edge research results on immutable data structures, and the current focus lies on hashed collections. I'm happy to discuss my roadmap with you and to see how common interests can be aligned.
I really think the Clojure collections should be considered. My understanding is that Scala collections grew out of Rich Hickey's implementations of Okasaki's Purely Functional Data Structures and Bagwell/Rompf's papers. Hickey had many critical insights and brought Okasaki's ideas into the mainstream (at least for the JVM world) by adding some of his own and providing a highly performant, working implementation.
We've been using Paguro's type-safe versions of the Clojure collections in production for about 2 years now. When we find bugs, it's in the code that doesn't use these collections. I've been using Paguro in Kotlin as-is for about 3 or 4 months now. We could add some NotNull annotations, but that's about all I see necessary for ideal Kotlin use right now.
Even if you aren't interested in Paguro, I think the original Clojure collections deserve mention on the front page of this project. Without them, there might not be the Scala ones. And Clojure's transformation api is beautifully simple.
I'm adding my vote for CHAMP.
I implemented a immutable map in Kotlin using the approach: https://github.com/norswap/triemap
It's a learning project: It doesn't have the niceties that capsule nor compatibility with the Java collections API. But it should give you a sense of what's involved. Once you've grasped the basics, the implementation isn't terribly hard.
Hi, Daniel chiming in :)
Disclaimer: I'm the creator of Javaslang
The ideas presented here are very interesting.
I'm not sure if the best way to provide interoperability with standard Java collections is to implement java.util.* collection interfaces. The mutator methods (i.e. the ones that return void) can only throw at runtime - which is unsafe. Javaslang's persistent collections for example will provide views for that purpose.
Javaslang was created to bring a big portion of the persistent/immutable part of Scala to Java (8). We benchmark our collections against Scala, Clojure, PCollection, Functional Java, Eclipse Collections and Java mutable collections. Here is how our Vector performs for example (> 1 \:= faster than).
We based our Hash collections on Hash Array Mapped Tree (HAMT), the Sorted collections on a Red/Back-Tree and the Vector on a Bit Mapped Trie (BMT). This is the collection hierarchy so far (upcoming 2.1.0):
Here is a comparison of the APIs of two persistent Vector implementations.
Here is how our Vector performs for example (> 1 := faster than).
To make it clear what I meant by PCollections
being slower:
Operation Ratio 10 100 1026
Create slang_persistent/pcollections_persistent 29.70× 114.82× 243.27×
Head slang_persistent/pcollections_persistent 1.75× 2.43× 2.81×
Tail slang_persistent/pcollections_persistent 4.38× 6.00× 7.59×
Get slang_persistent/pcollections_persistent 4.86× 6.08× 8.99×
Update slang_persistent/pcollections_persistent 1.73× 2.34× 3.07×
Prepend slang_persistent/pcollections_persistent 0.32× 0.74× 1.31×
Append slang_persistent/pcollections_persistent 0.51× 0.74× 1.09×
Slice slang_persistent/pcollections_persistent 0.53× 0.38× 0.40×
Iterate slang_persistent/pcollections_persistent 13.31× 10.68× 8.02×
i.e. except for slice
(which has a memory leak in Java
, Clojure
and PCollections
, i.e. a wrapper over all elements), Javaslang
is several times faster, consistently.
I think Kotlin
is a very nice new language, we would be glad to incorporate techniques from CHAMP
into Javaslang
(where applicable), if that would help its Kotlin
integration :)
@danieldietrich I, for one, have been using Javaslang in Kotlin a while now as is and loving it. A wrapper that turns static utility methods like
Try<Seq<T>> sequence(Iterable<? extends Try<? extends T>> values)
into extensions is all that I would wish for, and would in fact volunteer to come up with such a package myself - I have half of it spread across my projects already anyway :).
@zvozin, I think that would be a very valuable contribution! :)
@zvozin That sounds really great! I will setup an empty Github repository javaslang-kotlin and invite you (with write privileges).
Any update for it? It can be awesome feature if kotlin collection do not require copy all collection when 'add' operator is called in list.
@Shusek There's currently work in progress in the PR #20 with new implementations of persistent collections. The PersistentHashMap
implementation is based on CHAMP, iirc.
Feel free to join the review if you're interested.
PersistentHashMap implementation in 0.2 uses CHAMP data structure.
I have nothing against using the PCollections library but the CHAMP algorithm is the current state of the art for JVM based immutable collection implementations:
https://blog.acolyer.org/2015/11/27/hamt/
You can find an implementation here:
https://github.com/usethesource/capsule
The footprint improvements are quite impressive. It'd be a shame if Kotlin shipped a library that wasn't using the current best research into the field, especially as an implementation already exists. It'd be a nice competitive advantage over other languages.