Consider using Pangool?

Would it be interesting for Parkour to use Pangool (http://pangool.net/) rather than Hadoop Java MapRed?

Pangool is a thin Java layer on top of Hadoop MapRed that makes most of the things easier (i.e. joins, secondary sort) and enhances it (using instances rather than classes, making multiple outputs / inputs cleaner, proper text i/o formats, etc) while keeping about the same performance (5% variation). By using a simple Tuple model the limitations of key/value disappear (so one can essentially group by any combination of fields). It has no flow management and it remains at the MapReduce level, being a suitable tool for writing raw MapReduce jobs.

We had the idea to create a Clojure API on top of Pangool (actually, we were first working on another abstraction for adding flow capabilities to Pangool, and planning to add Clojure on top of it), but never ended it so far. We have been running Pangool for almost 2 years now and will be releasing a 1.0 version not so far in the future. We believe the tool is pretty stable and strong, we have used it in many of our clients and have heard of other use cases through the mailing list and so on.

If this is interesting at all we are keen to help.

damballa / parkour

Consider using Pangool? #1