Closed eentzel closed 9 years ago
Yep, it does look like cascading STILL only supports the older mapred API. Sounds like a design decision they made long ago and it still remains. Unfortunately SSTableInputFormat wasn't designed to be used with cascading/scalding which is why we now have this impedance mismatch. :disappointed:
I would use this if it works out of the box with little effort. Otherwise I would stick with what we have. It's still pretty fast to get the output you need.
@bvanberg
If we want to make a Scalding source out of
SSTableInputFormat
, I think this is the start of what we'd need. It's basically copy-n-paste fromSSTableInputFormat
, extendingorg.apache.hadoop.mapred.FileInputFormat
instead oforg.apache.hadoop.mapreduce.FileInputFormat
.The thing I'm hung up now is a
RecordReader
implementation to go with it — the old & new interfaces are just different enough that I'm not quite sure how to translate the existing implementation.