1) Changed key-selectors to be bind operations. This means that they can be merged with upstream work and removes a serialization step per group or join.
2) Tuning nippy. Disabled a pre-defined header per serialized value.
3) Allow raw Pig options. Add the following to any options map: :pig-options {"pig.option.name" value}
4) Implement Pig's Accumulator interface:
Pig's Accumulator interface uses a batched push method. It is called multiple times with smaller batches of values. PigPen uses a core.async channel to invert this pattern and provide a lazy sequence of values to the accumulator function. The user function will block until values are pushed through the channel.
@daveray @johnmidgley
Perf improvements on multiple fronts:
1) Changed key-selectors to be bind operations. This means that they can be merged with upstream work and removes a serialization step per group or join. 2) Tuning nippy. Disabled a pre-defined header per serialized value. 3) Allow raw Pig options. Add the following to any options map: :pig-options {"pig.option.name" value} 4) Implement Pig's Accumulator interface:
Pig's Accumulator interface uses a batched push method. It is called multiple times with smaller batches of values. PigPen uses a core.async channel to invert this pattern and provide a lazy sequence of values to the accumulator function. The user function will block until values are pushed through the channel.