fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 36 forks source link

why is FourMcTextInputFormat not an InputFormat? #50

Closed robmaz closed 3 years ago

robmaz commented 3 years ago

So I try to run a streaming job with a .4mc compressed input like

> hadoop jar $hadoop_streaming_jar -libjars $hadoop_4mc_jar -input txt.4mc -inputformat com.hadoop.mapreduce.FourMcTextInputFormat -output /test/out -mapper mapper.sh -reducer reducer.sh

and get an error:

Exception in thread "main" java.lang.RuntimeException: class com.hadoop.mapreduce.FourMcTextInputFormat not org.apache.hadoop.mapred.InputFormat

But

  [4mc-2.2.0]$ grep extends java/hadoop-4mc/src/main/java/com/hadoop/mapreduce/FourMcInputFormat.java
   public abstract class FourMcInputFormat<K, V> extends FileInputFormat<K, V> {

which as per the API doc in turn extends InputFormat:

  org.apache.hadoop.mapred
  Class FileInputFormat<K,V>
  java.lang.Object
  org.apache.hadoop.mapred.FileInputFormat<K,V>
  All Implemented Interfaces:
  InputFormat<K,V>

So why does this not work? Is this not supposed to work?

robmaz commented 3 years ago

Ok, wrong API doc entry. Apparently it's a org.apache.hadoop.mapreduce.InputFormat, not a org.apache.hadoop.mapred.InputFormat as the streaming interface expects ... I guess that makes it a duplicate of #17.