NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

Make the converter used in fromTextFileWithPaths accessible #315

Closed anwarrizal closed 10 years ago

anwarrizal commented 10 years ago

The use case I have here is the following: I need to use a TextSource with a custom InputFormat, but I still need to have access to the file name of the source. Using TextInput.fromTextFileWithPaths is nice, but unfortunately, there is no way to define a custom InputFormat.

In order to address the need, I'm forced to copy paste the InputConverter used by TextInput.fromTextFileWithPaths below in my application:

 val converter = new InputConverter[LongWritable, Text, (String, String)] {
      def fromKeyValue(context: InputContext, k: LongWritable, v: Text) = {
        val taggedSplit = context.getInputSplit.asInstanceOf[TaggedInputSplit]
        val fileSplit = taggedSplit.inputSplit.asInstanceOf[FileSplit]
        val path = fileSplit.getPath.toUri.toASCIIString
        (path, v.toString)
      }
    }

and then

    val lines = fromTextSource(TextSource(List(myinput), classOf[MyInputFormat], converter))

It would have been more convenient if the InputConverter available, just like we can have access to defaultTextConverter