Netflix / PigPen

Map-Reduce for Clojure
Apache License 2.0
565 stars 55 forks source link

Fields refactor #113

Closed mbossenbroek closed 9 years ago

mbossenbroek commented 9 years ago

@pkozikow

It's a big one; sorry about that. In the process, I pushed quite a bit of code from pigpen.cascading and the java UDFs into pigpen.cascading.runtime. This probably churned the code a little more than I would have liked, but I think it ended up in a good place.

Here's the overview of the cascading changes:

  1. The command->flowdef fn is passed the command definition, a sequence of its ancestors (tuples of pipes and commands), and the flowdef
  2. The new command->flowdef+ manages pipes produced by commands and looking up ancestors for the next command
  3. Instead of passing around a map of taps, sinks, pipes, etc, each command returns a Pipe
  4. Taps & Sinks are added directly to the flowdef. (I'd like to revisit this, but I couldn't get the old way working)
  5. Projections are evaluated entirely on the runtime side; we don't need to figure out field mappings ahead of time
  6. :reduce replaces :group-all and is now a separate command
  7. I added oven rewrites to merge any group/fold or reduce/fold combinations. In cascading, these are one operation, so we make them one operation in the model
  8. :generate is dependent on the previous command
  9. There's a new :noop command for working around self-joins