cgrand / xforms

Extra transducers and reducing fns for Clojure(script)
575 stars 32 forks source link

[bug] multiplex is nondeterministic #27

Closed den1k closed 5 years ago

den1k commented 6 years ago

From the docs and because we're passing it a vector, I'd expect multiplex to return transformed values in the order of delivery and transformation.

(into [] (x/multiplex [(map inc) (map str)]) (range 10))
=> [1 "0" 2 "1" 3 "2" 4 "3" 5 "4" 6 "5" 7 "6" 8 "7" 9 "8" 10 "9"]
(into [] (x/multiplex [(map inc) (map str)]) (range 10))
=> ["0" 1 "1" 2 "2" 3 "3" 4 "4" 5 "5" 6 "6" 7 "7" 8 "8" 9 "9" 10]

From my observation this only occurs with vector while return values with hash-maps are properly ordered:

(into [] (x/multiplex {:1 (map str) :2 (map identity)}) (range 10))
=> [[:1 "0"]
 [:2 0]
 [:1 "1"]
 [:2 1]
 [:1 "2"]
 [:2 2]
 [:1 "3"]
 [:2 3]
 [:1 "4"]
 [:2 4]
 [:1 "5"]
 [:2 5]
 [:1 "6"]
 [:2 6]
 [:1 "7"]
 [:2 7]
 [:1 "8"]
 [:2 8]
 [:1 "9"]
 [:2 9]]
cgrand commented 6 years ago

Do you have a use case where the deterministic behavior is needed?

den1k commented 6 years ago

Yes, I was implementing hull-moving-average which combines several weighted-moving-averages. To x/partition them I must know which is which.

Because vectors didn't work I ended up having to use maps which behave deterministically:

(defn factorial
  ([n]
   (assert (integer? n))
   (factorial 0 n))
  ([sum n]
   (if (zero? n)
     sum
     (recur (+ sum n) (dec n)))))

(defn wa
  ([] nil)
  ([^doubles acc]
   (when acc (/ (aget acc 1) (factorial (int (aget acc 0))))))
  ([^doubles acc x]
   (let [acc (or acc (double-array 2))] ; n, sum
     (let [n (inc (aget acc 0))]
       (doto acc
         (aset 0 n)
         (aset 1 (+ (aget acc 1) (* n x))))))))

(def wa-xf (x/reduce wa))

(defn wma-xf [period]
  (x/partition period 1 wa-xf))

(defn hma-xf
  "Hull Moving Average
  - https://github.com/thi-ng/umbrella/blob/master/packages/transducers-stats/src/hma.ts#L20
  - https://www.technicalindicators.net/indicators-technical-analysis/143-hma-hull-moving-average
  - https://www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/hull-moving-average"
  [period]
  {:pre [(int? period)]}
  (let
   [half-period (-> period (/ 2) int)
    psqrt       (int (Math/sqrt period))
    drop-p      (if (even? period) half-period (inc half-period))]
    (comp
     (x/multiplex {:half (comp
                          (drop drop-p)
                          (wma-xf half-period))
                   :all  (wma-xf period)})
     (x/partition 2)
     (map (fn [[[_ h] [_ a]]] (- (* 2 h) a))) ; <---------------- order important
     (wma-xf psqrt))))

Here's is a typescript based transducer version (the multiplex here acts like a deterministic (comp (multiplex [...n-fns]) (partition n))): https://github.com/thi-ng/umbrella/blob/master/packages/transducers-stats/src/hma.ts#L20

cgrand commented 6 years ago

I see but even with maps I'm not at ease with:

(map (fn [[[_ h] [_ a]]] (- (* 2 h) a))) ; <---------------- order important

because it relies on the two transducers working in lock steps.

For one input item, a transducer may output any number of items, there's no guarantee on that so multiplex can't generally promise to interleave outputs properly. That's why the two modes are both non-deterministic but the map one tags its output so you can recognize them.

The deterministic order is an implementation detail of clojure small maps:

=> (into [] (x/multiplex (zipmap (range 8) (repeat (map inc)))) (range 2))
[[0 1] [1 1] [2 1] [3 1] [4 1] [5 1] [6 1] [7 1] [0 2] [1 2] [2 2] [3 2] [4 2] [5 2] [6 2] [7 2]]
=> (into [] (x/multiplex (zipmap (range 9) (repeat (map inc)))) (range 2))
[[0 1] [7 1] [1 1] [4 1] [6 1] [3 1] [2 1] [5 1] [8 1] [0 2] [7 2] [1 2] [4 2] [6 2] [3 2] [2 2] [5 2] [8 2]]

Your need is legit but I'm pretty sure that modifying multiplex is the right answer.

You want to take action when each sub transducer has output a value. It sounds a bit like a temporal join. Any idea is welcome

den1k commented 6 years ago

@cgrand FYI, this is off-topic but related: https://github.com/thi-ng/umbrella/issues/37#issuecomment-417346728

aisamu commented 6 years ago

It hardly qualifies as a use-case, but I also saw that behavior trying to implement Fizzbuzz.

(let [test (fn [div sss]
             (fn [num]
               (when (zero? (mod num div)) sss)))
      chain [(map (test 3 "Fizz")) (map (test 5 "Buzz"))]
      xform (comp (x/multiplex [(comp (x/multiplex chain)
                                      (x/partition (count chain) (x/reduce str))
                                      (map #(if (= "" %) nil %)))
                                identity])
                  (x/partition (count chain) (x/reduce rfs/some)))]
  (eduction xform (range 16)))

The result of running that repeatedly got me very confused:

(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
("FizzBuzz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "FizzBuzz")
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
("FizzBuzz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "FizzBuzz")
("BuzzFizz" 1 2 "Fizz" 4 "Buzz" "Fizz" 7 8 "Fizz" "Buzz" 11 "Fizz" 13 14 "BuzzFizz")
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)

It makes sense now: multiplex is non-deterministic, but apparently consistent throughout a whole "run". Given that, is this a completely misguided usage? I tried following your directions from the other issue (https://github.com/cgrand/xforms/issues/26 -replace it with transjuxt's), but had no luck fitting the required x/some in the above form

cgrand commented 5 years ago

I'm not willing to make the multiplex contract stronger at the moment.