cgrand / xforms

Extra transducers and reducing fns for Clojure(script)
573 stars 32 forks source link

x/into is slow when from is a small map #46

Closed imrekoszo closed 2 years ago

imrekoszo commented 2 years ago

"... and n is usually small"
(from Rob Pike's 3rd rule of programming)

x/into currently has a performance problem when from is a map due to x/kvreducible? doing a satisfies? check which is currently relatively slow.

For large maps (as in the examples in the readme) the benefits outweigh the perf hit but for small maps x/into performs much worse than plain into. In the following benchmark (examples taken from the lib's readme), using x/into means rougly a x10 execution time for a 10-key map and x2 for a 100-key map.

(Using criterium with some custom reporting to produce a markdown table)

(require '[clojure.string :as str]
         '[clojure.template :refer [do-template]]
         '[criterium.core :as crit]
         '[net.cgrand.xforms :as x])

(defn bench [map-sizes]
   (let [os-details (crit/os-details)
         runtime-details (crit/runtime-details)]
      (println "```text")
      (println "Clojure version:" (:clojure-version-string runtime-details))
      (apply println ((juxt :vm-name :vm-version) runtime-details))
      (apply println
             (-> ((juxt :arch :name :version :available-processors) os-details)
                 (conj "cpu(s)")))
      (println "```")

      (println)

      (println "| scenario \\ map size |" (str/join " | " map-sizes) "|")
      (apply println "| --- |" (repeat (count map-sizes) "--- |"))

      (do-template
         [testcase form]
         (do
            (print "|" testcase)
            (doseq [map-size map-sizes]
               (let [m (zipmap (range map-size) (range))
                     results (crit/quick-benchmark form nil)
                     mean-str (with-out-str (crit/report-point-estimate "Mean" (:mean results)))
                     dev-str (with-out-str (crit/report-point-estimate-sqrt "Std dev" (:variance results)))]
                  (print " |" (str/trim mean-str) "<br/>" (str/trim dev-str))))
            (println " |"))

         "into with lazy sequence"
         (into {} (for [[k v] m] [k (inc v)]))

         "into with mapping transducer"
         (into {} (map (fn [[k v]] [k (inc v)])) m)

         "into with x/for (pairs are still allocated)"
         (into {} (x/for [[k v] _] [k (inc v)]) m)

         "x/into with x/for (no pairs are allocated)"
         (x/into {} (x/for [[k v] _] [k (inc v)]) m))))

;; run benchmark
(bench [10 100 200 400 1000 100000])

Benchmarks with Clojure 1.11 and 1.10:

Clojure version: 1.11.1
OpenJDK 64-Bit Server VM 17.0.3+7
aarch64 Mac OS X 12.4 10 cpu(s)
scenario \ map size 10 100 200 400 1000 100000
into with lazy sequence Mean : 1.348952 µs
Std dev : 24.861988 ns
Mean : 13.568244 µs
Std dev : 317.645395 ns
Mean : 29.764020 µs
Std dev : 525.675706 ns
Mean : 61.288493 µs
Std dev : 1.259576 µs
Mean : 163.532814 µs
Std dev : 3.289248 µs
Mean : 20.440507 ms
Std dev : 1.270464 ms
into with mapping transducer Mean : 1.049186 µs
Std dev : 20.402502 ns
Mean : 8.511076 µs
Std dev : 138.617196 ns
Mean : 18.426853 µs
Std dev : 416.586450 ns
Mean : 41.802255 µs
Std dev : 546.905266 ns
Mean : 122.938324 µs
Std dev : 851.322308 ns
Mean : 13.708227 ms
Std dev : 489.532061 µs
into with x/for (pairs are still allocated) Mean : 1.150028 µs
Std dev : 47.451763 ns
Mean : 9.077368 µs
Std dev : 163.565320 ns
Mean : 18.984716 µs
Std dev : 448.219432 ns
Mean : 38.363326 µs
Std dev : 651.510255 ns
Mean : 114.864841 µs
Std dev : 1.789141 µs
Mean : 14.376883 ms
Std dev : 454.901954 µs
x/into with x/for (no pairs are allocated) Mean : 13.989836 µs
Std dev : 146.613406 ns
Mean : 18.905237 µs
Std dev : 698.227692 ns
Mean : 23.867737 µs
Std dev : 714.131155 ns
Mean : 35.883011 µs
Std dev : 851.546315 ns
Mean : 87.452106 µs
Std dev : 1.271434 µs
Mean : 8.989663 ms
Std dev : 341.449485 µs
Clojure version: 1.10.3
OpenJDK 64-Bit Server VM 17.0.3+7
aarch64 Mac OS X 12.4 10 cpu(s)
scenario \ map size 10 100 200 400 1000 100000
into with lazy sequence Mean : 1.495262 µs
Std dev : 30.500522 ns
Mean : 13.515712 µs
Std dev : 155.138059 ns
Mean : 29.165906 µs
Std dev : 316.817925 ns
Mean : 60.796637 µs
Std dev : 1.210076 µs
Mean : 161.389040 µs
Std dev : 4.449719 µs
Mean : 21.274981 ms
Std dev : 827.855967 µs
into with mapping transducer Mean : 957.301138 ns
Std dev : 10.706939 ns
Mean : 8.360890 µs
Std dev : 274.215662 ns
Mean : 17.464455 µs
Std dev : 401.741516 ns
Mean : 38.597399 µs
Std dev : 667.340516 ns
Mean : 114.857688 µs
Std dev : 694.415295 ns
Mean : 13.344460 ms
Std dev : 612.973823 µs
into with x/for (pairs are still allocated) Mean : 1.134900 µs
Std dev : 26.532268 ns
Mean : 8.831915 µs
Std dev : 202.769162 ns
Mean : 18.131562 µs
Std dev : 646.563150 ns
Mean : 37.448493 µs
Std dev : 454.634040 ns
Mean : 108.474748 µs
Std dev : 703.122080 ns
Mean : 14.270699 ms
Std dev : 572.705985 µs
x/into with x/for (no pairs are allocated) Mean : 13.733844 µs
Std dev : 151.339935 ns
Mean : 18.738272 µs
Std dev : 279.121684 ns
Mean : 24.582692 µs
Std dev : 651.817443 ns
Mean : 37.827216 µs
Std dev : 110.424603 ns
Mean : 96.097221 µs
Std dev : 2.621397 µs
Mean : 11.035179 ms
Std dev : 441.365143 µs