aphyr / tesser

Clojure reducers, but for parallel execution: locally and on distributed systems.
873 stars 39 forks source link

Unexpected behavior using group-by and post-combine together #10

Open kasperlanger opened 9 years ago

kasperlanger commented 9 years ago

I was hoping to extend the group-by example with a (post-combine inc) like this

(->> (t/group-by :type)
     (t/map :mass)
     (t/max)
     (t/post-combine inc)
     (t/tesser [[{:name :electron, :type :lepton, :mass 0.51}
                 {:name :muon,     :type :lepton, :mass 105.65}
                 {:name :up,       :type :quark,  :mass 1.5}
                 {:name :down,     :type :quark,  :mass 3.5}]]))
; => {:lepton 106.65, :quark 4.5}

However it raises an exception because the post-combine is added to the top of the fold and inc doesn't like to post-combine the group-by result.

aphyr commented 9 years ago

Hmm, yeah that is a tough one, because often you do want the post-combine to operate on the results of the group-by, but your confusion is certainly legitimate!

The easy workaround here is (t/post-combine (partial tesser.utils/map-vals inc)), I suppose.

kasperlanger commented 9 years ago

One problem with the workaround is that you quickly end up with transformations that doesn't compose.

An example is trying to use group-by and range together like

 (->> (t/group-by :name)                                                                                                                                                                                                                                   
      (t/map :score)                                                                                                                                                                                                                                       
      (t/range)                                                                                                                                                                                                                                            
      (t/tesser [[{:name "Kasper" :score 21}                                                                                                                                                                                                               
                  {:name "Kasper" :score 22}                                                                                                                                                                                                               
                  {:name "Kyle" :score 42}]]))  
; => [nil nil]
aphyr commented 9 years ago

That's a great objection. Can you find a coherent way to fix it? I'd love to put time into this right now but I've got some other things going on. Happy to see a PR though!

kasperlanger commented 9 years ago

I'll give it a try. My best bet is changing group-by to use compile-fold like fuse. Then the above example would look like

(->> (t/group-by' :name
                  (->> (t/map :score)                                                                                                                                                                    
                       (t/range))                                                                                                                                                                                                                                            
     (t/tesser [[{:name "Kasper" :score 21}                                                                                                                                                                                                               
                 {:name "Kasper" :score 22}                                                                                                                                                                                                               
                 {:name "Kyle" :score 42}]]))  

What's your feelings on that approach?

kasperlanger commented 9 years ago

And another example with post-combine at different stages

(->> (t/group-by :type
                 (->> (t/map :mass)
                      (t/max)
                      (t/post-combine inc))
     (t/post-combine #(assoc % :foo :bar)
     (t/tesser [[{:name :electron, :type :lepton, :mass 0.51}
                 {:name :muon,     :type :lepton, :mass 105.65}
                 {:name :up,       :type :quark,  :mass 1.5}
                 {:name :down,     :type :quark,  :mass 3.5}]]))

; => {:lepton 106.65, :quark 4.5, :foo :bar}
sbelak commented 8 years ago

kasperlanger's solution seems neat (a pathetic "me to" until I wrap my head around Tesser enough to produce a PR).