Netflix / PigPen

Map-Reduce for Clojure
Apache License 2.0
565 stars 55 forks source link

Performance #9

Closed mbossenbroek closed 10 years ago

mbossenbroek commented 10 years ago

@daveray @johnmidgley

Perf improvements on multiple fronts:

1) Changed key-selectors to be bind operations. This means that they can be merged with upstream work and removes a serialization step per group or join. 2) Tuning nippy. Disabled a pre-defined header per serialized value. 3) Allow raw Pig options. Add the following to any options map: :pig-options {"pig.option.name" value} 4) Implement Pig's Accumulator interface:

Pig's Accumulator interface uses a batched push method. It is called multiple times with smaller batches of values. PigPen uses a core.async channel to invert this pattern and provide a lazy sequence of values to the accumulator function. The user function will block until values are pushed through the channel.