brandonbloom / fipp

Fast Idiomatic Pretty Printer for Clojure
525 stars 44 forks source link

Benchmark RRB Vectors vs 2-3 Finger Trees #6

Closed brandonbloom closed 9 years ago

brandonbloom commented 11 years ago

https://github.com/clojure/core.rrb-vector

Bonus: There is currently a cljs implementation of rrb-vectors.

brandonbloom commented 11 years ago

@jonase Were you gonna take a run at this?

I know @ibdknox would be happy to see it :-)

jonase commented 11 years ago

@brandonbloom I'd like to see fipp ported to cljs. If this is the shortest path to that end I'll give it a try (sometime next week since I'm currently travelling).

michalmarczyk commented 11 years ago

I've wired in core.rrb-vector in place of finger trees in my fork, in the rrb-vector branch. The conj to left function I'm using is simply concat with a singleton vector (direct conj to left is not exported by core.rrb-vector yet).

Running benchmarks at a REPL started with rlwrap java -cp $(lein classpath) clojure.main produces the following results for me (skipping prn and clojure.pprint):

(require '[fipp.benchmark :as fb])

;; master
(fb/bench-all 100 (vec (range 1000)))
#<edn$pprint fipp.edn$pprint@5a12d46c>
"Elapsed time: 1678.482498 msecs"

(fb/bench-all 1000 (random-map))
#<edn$pprint fipp.edn$pprint@5a12d46c>
"Elapsed time: 1834.286289 msecs"

;; rrb-vector
(fb/bench-all 100 (vec (range 1000)))
#<edn$pprint fipp.edn$pprint@e9cad1d>
"Elapsed time: 1577.518518 msecs"

(fb/bench-all 1000 (fb/random-map))
#<edn$pprint fipp.edn$pprint@e9cad1d>
"Elapsed time: 1680.373086 msecs"

So there appears to be a small speedup with the naive concatenation-based conjl in these benchmark cases.

As mentioned on IRC, I expect to be able to make this usage scenario much faster with a custom conj to left operation; also, the implementation of concatenation in core.rrb-vector is definitely not tuned for high performance yet -- needless to say that's a major development goal. In the near future, I guess I'll be bumping conj to left higher up the TODO list.

michalmarczyk commented 11 years ago

Oh, also: lein test seems happy and pretty printing small data structures at the REPL works as expected AFAICT; haven't done much testing beyond that so far.

brandonbloom commented 11 years ago

Cool! Thanks for doing this.

Unfortunately, my little "random-map" benchmark is too variable to be a useful test. It's an extra bad benchmark, since it started out life as a mini-unit test. However, it seemed like the rrb-vectors were a tiny bit slower in many cases; not a clear win on my box. I really need to invest some time into a few more robust benchmarks and an easier way to test them on multiple builds...

In any case, I'd be very happy to take this via a pull request once you've achieved your planned conjl optimizations. Even if the gain is pretty small.

brandonbloom commented 11 years ago

Cool! Thanks for doing this.

Unfortunately, my little "random-map" benchmark is too variable to be a useful test. It's an extra bad benchmark, since it started out life as a mini-unit test. However, it seemed like the rrb-vectors were a tiny bit slower in many cases; not a clear win on my box. I really need to invest some time into a few more robust benchmarks and an easier way to test them on multiple builds...

In any case, I'd be very happy to take this via a pull request once you've achieved your planned conjl optimizations. Even if the gain is pretty small.

michalmarczyk commented 11 years ago

Cool. I've been thinking about fipp's use case and I have some ideas for speeding things up that I want to investigate, some involving RRB vectors, some not. I'll keep you posted as I make progress on these.

michalmarczyk commented 11 years ago

About the ClojureScript port -- I've started looking into what that would involve:

Deques could be implemented using core.rrb-vector or @wagjo's data-cljs, which provides a ClojureScript implementation of finger trees based on data.finger-tree, so this is not a problem.

Clearly transduce would need to be ported; also not a problem AFAICT.

What is a problem, however, is that in ClojureScript you can't extend-protocol to something meaning "any kind of vector / set / map", since these concepts correspond to protocols and there's no way to provide an implementation of a protocol given the presence of an implementation of another protocol. Extending to all the built-in types is (1) fragile -- concrete seq types, for example, come and go all the time; (2) unsatisfactory anyway, since it means that any new impl of any of the core protocols would need to provide its own impl of IPretty (and depend on fipp, I suppose). CLJS-527 could help, I suppose; I'll post a link to this ticket over there. (It could help in that one could run as the default impl of IPretty, using extend-type to install more appropriate implementations.)

brandonbloom commented 11 years ago

There are a few separate issues here.

I've created #7 to track CLJS work and #8 tracks improvements to the pretty-doc-ification process.

brandonbloom commented 11 years ago

Work is now on this branch: https://github.com/brandonbloom/fipp/tree/rrb

brandonbloom commented 10 years ago

Any news on RRB perf? :-)

wagjo commented 10 years ago

There is a bit aging CLJS benchmark which includes common RRB operations (look for large vectors)

brandonbloom commented 9 years ago

If it's straightforward to maintain (with cljx or whatever), I'm OK with the CLJ version using finger trees and the CLJS version using rrb vectors. Whatever is easiest to get the CLJS version out.

However, I'm also happy to switch to whichever one benchmarks as faster. I just won't bother switching unless there is compelling evidence to do so.

brandonbloom commented 9 years ago

I did some more thorough benchmarking on the latest version of Fipp. For structures that are not deeply nested, such as (vec (range n)), the data structure doesn't matter much at all, since it's rarely ever got more than a single element in it. However, for more realistic mixed nested structures, like edn data or clj source code, rrb vectors seem twice as fast, even without conjl and popl optimizations (I'm using catvec and subvec right now).

So... Fipp 0.6.1 will move to rrb vectors, eliminating the use of finger trees!

Sorry about the delay on this work here. Thanks all for the discussion and related work!