bsless / clj-fast

Unpredictably faster Clojure
Eclipse Public License 2.0
234 stars 1 forks source link

fast-case #8

Open joinr opened 4 years ago

joinr commented 4 years ago

I ran across this a while back, when optimizing for some identity-based comparisons. It reared its head again when I was optimizing clojure's defrecord during the icfpc 2019 optimization exercise, specifically based on how clojure.core/defrecord handles its implementations of valAt, assoc, and a (lack of) IFn lookup style implementation. I'll post the alternative I'm working with (fastrecord) in another issue. Here though are some optimizations for clojure.core/case that leverage efficient identical? checks in the case where you have all keywords.

clojure.core/case already internally optimizes its lookup scheme based on the test constants. I haven't dug into the int test case (I assume it's optimal, why not). The other cases are if you have keyword test cases, or otherwise structural hasheq constants that can be tested by hashing.

In the case of records, clojure uses clojure.core/case to dispatch based on the keyword associated with the field for many operations, which will go down the identical? lookup path. This is good in general, with a substantial caveat. For test case cardinality <=20, it is empirically faster to do a linear scan through the test cases and check identical? rather than the scheme clojure.core/case uses, which is to look up in a map (I "think" a persistent one, not sure).

This means, for small sets of test constants, you always pay the price of lookup into the hash. You also eschew any possible short circuiting opportunities which may arise by design or naturally from the data.

My alternative is twofold: a macro that establishes a case-like form specific to identical? cases (e.g. keywords, or anything else the caller is confident that identical? is appropriate for):

(defmacro case-identical?
  "Like clojure.core/case, except instead of a lookup map, we
   use `condp` and `identical?` in an unfolding macroexpansion
   to allow fast case lookups for smaller cases where we may
   beat the o(1) cost of hashing the clojure.core/case incurs
   via its lookup map.  Some workloads are substantially (~3x)
   faster using linear lookup and `identical?` checks.

   Caller should be aware of the differences between `identical?`
   and `=` or other structural hashing comparisons.  `identical?`
   is appropriate for object (e.g. pointer) equality between
   instances, and is more restrictive than structural equality
   per `clojure.core/=`; objects may be = but not `identical?`,
   where `indentical?` objects are almost certainly `=`."
  [e & clauses]
  (let [ge      (with-meta (gensym) {:tag Object})
        default (if (odd? (count clauses))
                  (or (last clauses) ::nil)
                  `(throw (IllegalArgumentException. (str "No matching clause: " ~ge))))
        conj-flat   (fn [acc [k v]]
                      (conj acc k v))]
    (if (> 2 (count clauses))
      `(let [~ge ~e] ~default)
      (let [pairs     (->> (partition 2 clauses)
                           (reduce (fn [acc [l r]]
                                     (if (seq? l)
                                       (reduce conj acc (for [x l] [x r]))
                                       (conj acc [l r])))  []))
            dupes    (->> pairs
                          (map first)
                          frequencies
                          (filter (fn [[k v]]
                                    (> v 1)))
                          (map first))
            args     (reduce conj-flat [] pairs)]
        (when (seq dupes)
          (throw (ex-info (str "Duplicate case-identical? test constants: " (vec dupes)) {:dupes dupes})))
        `(let [~ge ~e]
           (condp identical? ~ge
             ~@(if default (conj args (case default ::nil nil default)) args)))))))

and a drop-in replacement for clojure.core/case, fast-case, which detects conditions where it would be better to use linear scans rather than hashing and lookups:

(defmacro fast-case
   "Drop-in replacement for clojure.core/case that attempts to optimize
    identical? case comparison (e.g. keywords).
    Takes an expression, and a set of clauses.

    Each clause can take the form of either:

    test-constant result-expr

    (test-constant1 ... test-constantN)  result-expr

    The test-constants are not evaluated. They must be compile-time
    literals, and need not be quoted.  If the expression is equal to a
    test-constant, the corresponding result-expr is returned. A single
    default expression can follow the clauses, and its value will be
    returned if no clause matches. If no default expression is provided
    and no clause matches, an IllegalArgumentException is thrown.

    Unlike cond and condp, fast-case does a constant-time dispatch for
    ints and non-keyword constants; the clauses are not considered
    sequentially.

    If all test cases are keywords, then fast-case will leverage an
    optimized path for `identical?` checks, where we balance the
    performance of a linear comparison of entries by object
    identity with the cost of an associative lookup and hashing
    of the case objects.  This can yield signficant savings
    for cases that are all keywords, and when there may be
    benefit for short-circuiting operations (e.g. the most
    likely case is first).

    All manner of constant expressions are acceptable in case,
    including numbers, strings,  symbols, keywords, and (Clojure)
    composites thereof. Note that since lists are used to group
    multiple constants that map to the same expression, a vector
    can be used to match a list if needed. The  test-constants
    need not be all of the same type."
  [e & clauses]
  (let [ge (with-meta (gensym) {:tag Object})
        default (if (odd? (count clauses))
                  (last clauses)
                  `(throw (IllegalArgumentException. (str "No matching clause: " ~ge))))
        conj-flat   (fn [acc [k v]]
                      (conj acc k v))]
    (if (> 2 (count clauses))
      `(let [~ge ~e] ~default)
      (let [pairs (->> (partition 2 clauses)
                       (reduce (fn [acc [l r]]
                                 (if (seq? l)
                                   (reduce conj acc (for [x l] [x r]))
                                   (conj acc [l r])))  []))]
        (if (and (every? keyword? (map first pairs))
                 (<= (count pairs) 20))
          `(case-identical? ~e ~@clauses)
          `(clojure.core/case ~e ~@clauses))))))

Up to values of 20, looking up the 20th value is still faster than O(1) hashing via clojure.core/case. Fewer values have far more substantial gains (around 2-3x for small test sets). For code that's on a hot path (like the internal implementations of clojure.core/defrecord and its lookups in valAt, assoc, etc., which dispatch to clojure.core/case off of keyword identity), this can provide substantial performance savings (which is the intent of records and similar performance accelerators after all).

(let [k :a]
  (c/quick-bench
   (fast-case k :a 0 :b 1 :c 2 :d 3 :e 4 :f 5 :g 6 :h 7 :i 9 :j 10 :k 11 :l
     12 :m 13 :n 14 :o 15 :p 16 :q 17 :r 18 :s 19 :none)))

;; Evaluation count : 108369882 in 6 samples of 18061647 calls.
;; Execution time mean : 3.791586 ns

(let [k :a]
  (c/quick-bench
   (case k :a 0 :b 1 :c 2 :d 3 :e 4 :f 5 :g 6 :h 7 :i 9 :j 10 :k 11 :l 12 :m
         13 :n 14 :o 15 :p 16 :q 17 :r 18 :s 19 :none)))

;; Evaluation count : 55105488 in 6 samples of 9184248 calls.
;; Execution time mean : 8.826299 ns

(let [k :s]
  (c/quick-bench
   (fast-case k :a 0 :b 1 :c 2 :d 3 :e 4 :f 5 :g 6 :h 7 :i 9 :j 10 :k 11 :l
              12 :m 13 :n 14 :o 15 :p 16 :q 17 :r 18 :s 19 :none)))

;; Execution time mean : 9.388600 ns

(let [k :s]
  (c/quick-bench
   (fast-case k :a 0 :b 1 :c 2 :d 3 :e 4 :f 5 :g 6 :h 7 :i 9 :j 10 :k 11 :l
              12 :m 13 :n 14 :o 15 :p 16 :q 17 :r 18 :s 19 :none)))

;; Evaluation count : 55049616 in 6 samples of 9174936 calls.
;; Execution time mean : 9.402726 ns

(let [k :a] (c/quick-bench (fast-case k :a 0 :b 1 :none)))

;; Evaluation count : 105935916 in 6 samples of 17655986 calls.
;; Execution time mean : 3.748773 ns

(let [k :missing] (c/quick-bench (fast-case k :a 0 :b 1 :none)))

;; Evaluation count : 101388984 in 6 samples of 16898164 calls.
;; Execution time mean : 4.107746 ns

(let [k :a] (c/quick-bench (case k :a 0 :b 1 :none)))

;; Evaluation count : 58064934 in 6 samples of 9677489 calls.
;; Execution time mean : 8.382319 ns

(let [k :missing] (c/quick-bench (case k :a 0 :b 1 :none)))

;; Evaluation count : 56006400 in 6 samples of 9334400 calls.
;; Execution time mean : 8.979109 ns

[edit] added a default case that doesn't flip out on nil for case-identical?

bsless commented 3 years ago

Thinking about this and #9 , what do you think will be the effects of missed branch predictions on this code? Can we engineer a test case where branch prediction isn't possible to test its effects?

joinr commented 3 years ago

I'm no expert on that, let alone knowing how engineer good degenerate cases. I guess you would have random input over the cases....no idea beyond that.

bsless commented 3 years ago

I ran the following interesting test:

(defn foo [k]
  (fast-case k :a 0 :b 1 :c 2 :d 3 :e 4 :f 5 :g 6 :h 7 :i 9 :j 10 :k 11 :l 12 :m 13 :n 14 :o 15 :p 16 :q 17 :r 18 :s 19 :none))

(defn bar [k]
  (case k :a 0 :b 1 :c 2 :d 3 :e 4 :f 5 :g 6 :h 7 :i 9 :j 10 :k 11 :l
             12 :m 13 :n 14 :o 15 :p 16 :q 17 :r 18 :s 19 :none))

(def ks [:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :q :r :s :none])

(def lots (repeatedly 10000 #(rand-nth ks)))

(do
  (c/bench (doseq [k lots] 0))
  (c/bench (doseq [k lots] (foo k)))
  (c/bench (doseq [k lots] (bar k))))

Overhead: 571.822473 µs Fast case: 654.098578 µs, minus overhead 85 Core case: 740.666878 µs, minus overhead 170 => fast case is 2x faster, amortized .

joinr commented 3 years ago

Nice. I tend to use case quit a bit, as does some clojure internals like defrecord, so this seems generally useful.

bsless commented 3 years ago

I wander how critical it actually is. While identity case checks are 2x faster we're still talking about 4 vs 8 ns in the worst case scenario, not to mention I did not measure these influences in a wider context of a running application. Perhaps cache misses will degrade other parts' characteristics, so I'm still wary. It would be interesting to get a profile of a typical application and how many times it cases on keywords.

joinr commented 3 years ago

If you're using keyword access for records, it will impact your performance. Ended up being a enough in the IFCPC optimization case (where I first ran into this, and defrecord's weaker performance compared to arraymaps..) that I found out about this. If you're not using idiomatic paths that leverage case on hot paths, it may not matter then. That can probably be applied to most of these optimizations though (e.g. context and profiling matter).

bsless commented 3 years ago

Is there a need to introduce the binding to ge and type hint it as an Object? I fail to see the ratilnale behind it. Am I missing anything?

joinr commented 3 years ago

That came from the original source in clojure.core/case. Unsure.