Netflix / PigPen

Map-Reduce for Clojure
Apache License 2.0
565 stars 55 forks source link

PigPenFnAlgebraic shouldn't implement Accumulator #21

Closed mbossenbroek closed 10 years ago

mbossenbroek commented 10 years ago

Pig, it turns out, doesn't support folding more than one grouping at a time. This means that a group-by fold will work as expected, but a cogroup with two or more relations will fall back to a non-folding interface.

This was causing Pig to fall back to the Accumulator interface, which is not compatible with Algebraic at the moment. The effect of this is that it would return nil for each fold operation instead of the value.

This is a stop-gap solution in that it will return the correct result, but it might not be as efficient. The long-term solution is to compute two independent group-by/fold operations and then a final cogroup to bring the folded results together. Obviously, this will come with a large disclaimer that it will create 3+ hadoop jobs.