The use case I have here is the following: I need to use a TextSource with a custom InputFormat, but I still need to have access to the file name of the source. Using TextInput.fromTextFileWithPaths is nice, but unfortunately, there is no way to define a custom InputFormat.
In order to address the need, I'm forced to copy paste the InputConverter used by TextInput.fromTextFileWithPaths below in my application:
val converter = new InputConverter[LongWritable, Text, (String, String)] {
def fromKeyValue(context: InputContext, k: LongWritable, v: Text) = {
val taggedSplit = context.getInputSplit.asInstanceOf[TaggedInputSplit]
val fileSplit = taggedSplit.inputSplit.asInstanceOf[FileSplit]
val path = fileSplit.getPath.toUri.toASCIIString
(path, v.toString)
}
}
and then
val lines = fromTextSource(TextSource(List(myinput), classOf[MyInputFormat], converter))
It would have been more convenient if the InputConverter available, just like we can have access to defaultTextConverter
The use case I have here is the following: I need to use a
TextSource
with a customInputFormat
, but I still need to have access to the file name of the source. UsingTextInput.fromTextFileWithPaths
is nice, but unfortunately, there is no way to define a customInputFormat
.In order to address the need, I'm forced to copy paste the
InputConverter
used byTextInput.fromTextFileWithPaths
below in my application:and then
It would have been more convenient if the InputConverter available, just like we can have access to
defaultTextConverter