dawranliou / emoji

Emoji for Clojure
https://cljdoc.org/d/dawran6/emoji
MIT License
21 stars 2 forks source link

Transducers for performance #6

Open dawranliou opened 4 years ago

dawranliou commented 4 years ago

Suggested by user Vincent Cantin @green-coder on Clojurians slack:

I took a look at the code and it seems that you could get better performances using transducers.

Transducers remove all the cost of the laziness and also remove intermediate buffering as well. For a library, it is often desirable.

See the blog series: https://vincent.404.taipei/clojure/build-your-own-transducer-part1/

dawranliou commented 4 years ago

I did a naive transducer implementation in dev/user.clj:

;; perf testing
;; alias-map
(defn alias-map []
  (->> emojis
       (map (juxt identity :aliases))
       (mapcat (fn [[emoji aliases]] (map (fn [alias] [(keyword alias) emoji]) aliases)))
       (into {})))

(defn alias-map-xf []
  (into {} (comp
            (map (juxt :aliases identity))
            (mapcat (fn [[aliases emoji]]
                      (map (fn [alias] [(keyword alias) emoji]) aliases))))
        emojis))

;; unicode-map
(defn unicode-map []
  (->> emojis
       (map (juxt identity :emojiChar))
       (map (fn [[emoji unicode]] [(keyword unicode) emoji]))
       (into {})))

(defn unicode-map-xf []
  (into {} (comp
            (map (juxt identity :emojiChar))
            (map (fn [[emoji unicode]] [(keyword unicode) emoji])))
        emojis))

(comment
  (require '[criterium.core :as criterium])

  (criterium/quick-benchmark* (alias-map))

  "
Evaluation count : 246 in 6 samples of 41 calls.
             Execution time mean : 2.592084 ms
    Execution time std-deviation : 152.224926 µs
   Execution time lower quantile : 2.345529 ms ( 2.5%)
   Execution time upper quantile : 2.748554 ms (97.5%)
                   Overhead used : 10.789885 ns
Found 1 outliers in 6 samples (16.6667 %)
  low-severe     1 (16.6667 %)
 Variance from outliers : 14.5692 % Variance is moderately inflated by outliers
"

  (criterium/quick-bench (alias-map-xf))

  "
Evaluation count : 246 in 6 samples of 41 calls.
             Execution time mean : 2.737474 ms
    Execution time std-deviation : 108.781029 µs
   Execution time lower quantile : 2.624297 ms ( 2.5%)
   Execution time upper quantile : 2.893353 ms (97.5%)
                   Overhead used : 10.789885 ns
Found 1 outliers in 6 samples (16.6667 %)
  low-severe     1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
"

  (criterium/quick-bench (unicode-map))
  "
Evaluation count : 732 in 6 samples of 122 calls.
             Execution time mean : 821.710403 µs
    Execution time std-deviation : 22.457106 µs
   Execution time lower quantile : 787.478180 µs ( 2.5%)
   Execution time upper quantile : 841.088796 µs (97.5%)
                   Overhead used : 10.789885 ns
"

  (criterium/quick-bench (unicode-map-xf))
  "
Evaluation count : 558 in 6 samples of 93 calls.
             Execution time mean : 1.051565 ms
    Execution time std-deviation : 27.120888 µs
   Execution time lower quantile : 1.008676 ms ( 2.5%)
   Execution time upper quantile : 1.076938 ms (97.5%)
                   Overhead used : 10.789885 ns
"

Any comment is welcome.

green-coder commented 4 years ago

I don't know why it is slower. I am surprised too.

green-coder commented 4 years ago

How is the performance if you use mapv instead of map in the transducer?

(defn alias-map-xf []
  (into {} (comp
            (map (juxt :aliases identity))
            (mapcat (fn [[aliases emoji]]
                      (mapv (fn [alias] [(keyword alias) emoji]) aliases))))
        emojis))
dawranliou commented 4 years ago

It's faster!

Evaluation count : 300 in 6 samples of 50 calls.
             Execution time mean : 2.294920 ms
    Execution time std-deviation : 206.740517 µs
   Execution time lower quantile : 2.115987 ms ( 2.5%)
   Execution time upper quantile : 2.616227 ms (97.5%)
                   Overhead used : 10.780594 ns

Found 1 outliers in 6 samples (16.6667 %)
    low-severe   1 (16.6667 %)
 Variance from outliers : 15.7753 % Variance is moderately inflated by outliers
green-coder commented 4 years ago

Ideally, you could avoid having an intermediary vector built by mapv by implementing your own custom transducer - in case you love puzzle. I am sure that you can get a faster runtime. Hint: use reduce on the aliases to transmit each alias directly into the inner reducer.

You could also remove (map (juxt :aliases identity)) for faster performance and just use a let.

dawranliou commented 4 years ago

Thanks for the hints. I'll give it a try.