NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

Bug in Scoobi job planner #298

Closed ivmaykov closed 11 years ago

ivmaykov commented 11 years ago

Found a bug in the Scoobi job planner, discussed with @etorreborre on IRC last night. If Scoobi is joining 2 DLists of the same type, but one of the DLists was computed from an input that was .join()ed to a zipped DObject, the planner generates a broken job plan.

Expected: 1 MR job: 1 Map stage that processes both inputs, 1 shuffle, 1 reduce

Actual: 2 MR jobs:

Minimal code that reproduces the issue is here: https://gist.github.com/ivmaykov/5c9b9fc7febc117e3ed8

Verbose output of local (but not inmemory, using hadoop local-mode) run here: https://gist.github.com/ivmaykov/cbdd2524f606feb0b60a