NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

objectFromTextFile and friends shouldn't execute a MR job #288

Closed blever closed 11 years ago

blever commented 11 years ago

There are a bunch of DObject "factory" methods of the form: objectFromXXX(path: String): DObject[YYY]. Examples are objectFromTextFile and objectKeyFromSequenceFile.

In their current implementation they all delegate to their DList companion and call head. This will result in running a MR job that reads in the file then writes it back out again. It should be possible to provide an implementation that reads the file(s) directly and performs the necessary transformations to create a DObject.