bsless / clj-fast

Unpredictably faster Clojure
Eclipse Public License 2.0
234 stars 1 forks source link

May not be in scope, but fast detection of set intersection #4

Open joinr opened 4 years ago

joinr commented 4 years ago

I ran into a use case for a work product that needed to quickly test membership between 2 sets to see if "any" member existed in both, and return the first intersecting member as fast as possible.

A naive way one might reach for is

(defn slow-some-member [s1 s2]
  (first (clojure.set/intersection s1 s2)))

Original implementation was

(defn some-member [s1 s2]  
  (let [[l r]  (if (< (count s1) (count s2)) [s1 s2]
                   [s2 s1])]
    (reduce (fn [acc x]
              (if (r x)
                (reduced x)
                acc)) nil l)))

Which of course has some inefficiencies due to destructuring and slow function calls through clojure.lang.RT.

Revised version assumes sets as input,

(defn some-member2 [^clojure.lang.APersistentSet s1
                    ^clojure.lang.APersistentSet s2]
  (let [l  (if (< (.count s1) (.count s2)) s1
               s2)
        r   (if (identical? l s1) s2 s1)]
    (reduce (fn [acc x]
              (if (r x)
                (reduced x)
                acc)) nil l)))
user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (slow-some-member s1 s2)))
Evaluation count : 975378 in 6 samples of 162563 calls.
             Execution time mean : 612.287581 ns
    Execution time std-deviation : 3.310927 ns
   Execution time lower quantile : 608.261646 ns ( 2.5%)
   Execution time upper quantile : 615.629475 ns (97.5%)
                   Overhead used : 2.233057 ns
nil

user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (some-member s1 s2)))
Evaluation count : 2004582 in 6 samples of 334097 calls.
             Execution time mean : 300.056678 ns
    Execution time std-deviation : 2.946006 ns
   Execution time lower quantile : 297.321682 ns ( 2.5%)
   Execution time upper quantile : 303.695844 ns (97.5%)
                   Overhead used : 2.233057 ns
nil
user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (some-member2 s1 s2)))
Evaluation count : 3298134 in 6 samples of 549689 calls.
             Execution time mean : 181.087016 ns
    Execution time std-deviation : 0.816367 ns
   Execution time lower quantile : 180.408265 ns ( 2.5%)
   Execution time upper quantile : 182.115366 ns (97.5%)
                   Overhead used : 2.233057 ns
nil

Gains are variable, and I haven't studied them for a wide range of set combinations. I'm interested in any faster means of doing this, although I just realized that my specific use case is amenable to memoization, which improves runtime (via memo-2) like 10x over my original "optimized" some-member.

user> (let [f (memo-2 some-member2) s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (f s1 s2)))
Evaluation count : 19762416 in 6 samples of 3293736 calls.
             Execution time mean : 28.191368 ns
    Execution time std-deviation : 0.181916 ns
   Execution time lower quantile : 28.035774 ns ( 2.5%)
   Execution time upper quantile : 28.427699 ns (97.5%)
                   Overhead used : 2.233057 ns