Closed joinr closed 3 years ago
Managed to find some meat while maintaining portability. Anything else sticks out?
(defn -rsubseq
([^clojure.lang.Sorted sc test key]
(let [include (mk-bound-fn sc test key)]
(if (or (identical? < test) (identical? <= test))
(when-let [s (. sc seqFrom key false)]
(let [e (.first ^clojure.lang.ISeq s)]
(if (include e) s (next s))))
(take-while include (. sc seq false)))))
([^clojure.lang.Sorted sc start-test start-key end-test end-key]
(when-let [[e :as s] (. sc seqFrom end-key false)]
(take-while (mk-bound-fn sc start-test start-key)
(if ((mk-bound-fn sc end-test end-key) e) s (next s))))))
Even faster if you consider the happy path
(defn <mk-bound-fn
{:private true}
[^clojure.lang.Sorted sc key]
(fn [e]
(< ^long (.. sc comparator (compare (. sc entryKey e) key)) 0)))
(defn <=mk-bound-fn
{:private true}
[^clojure.lang.Sorted sc key]
(fn [e]
(<= ^long (.. sc comparator (compare (. sc entryKey e) key)) 0)))
(defn mk-bound-fn
{:private true}
[^clojure.lang.Sorted sc test key]
(fn [e]
(test ^long (.. sc comparator (compare (. sc entryKey e) key)) 0)))
(defn -seek
[include ^clojure.lang.Sorted sc key]
(when-let [s (. sc seqFrom key false)]
(let [e (.first ^clojure.lang.ISeq s)]
(if (include e) s (next s)))))
(defn -rsubseq
([^clojure.lang.Sorted sc test key]
(cond
(identical? < test)
(let [include (<mk-bound-fn sc key)]
(-seek include sc key))
(identical? <= test)
(let [include (<=mk-bound-fn sc key)]
(-seek include sc key))
true
(let [include (mk-bound-fn sc test key)]
(take-while include (. sc seq false)))))
([^clojure.lang.Sorted sc start-test start-key end-test end-key]
(when-let [[e :as s] (. sc seqFrom end-key false)]
(take-while (mk-bound-fn sc start-test start-key)
(if ((mk-bound-fn sc end-test end-key) e) s (next s))))))
Very interesting. I'll look at those.
Also profiled, see the attached svg (as txt, sorry) for ref. 05-cpu-flamegraph-2020-12-15-17-57-28.txt
Finally got around to finish this. If you drop in the new implementations you should see a ~10x speed up for the test cases you've provided on a core sorted map
good deal 👍
For reference, I am using a sorted map to denote a set of intervals. I would like to quickly do intersection tests for an arbitrary point x to see if it intersects with a known interval. So, for the ordered map {a b}, the key a denotes the left bound of an interval, and b denotes the right, e.g. [a b]. If a value is contained by the interval, it satisfies the properties of (>= x a), (<= x b). Since we already have sorted maps out of the box, this seems like a simple use case. We can leverage the operation
rsubseq
to leverage the tree's structure and search for entries where the key fits our first criteria, then test the val for the second criteria. Notably, clojure.data.avl provides more robust features and is possibly faster, so we'll include it as an option.Let's look at performance.
So at least for this operation (maybe there other others), direct method invocation and hinting really helps out in the peformance. 112ns vs 1.4microseconds is pretty substantial. This could open the door to additional performance questions related to the less often used sorted collections in core.