fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 36 forks source link

TextInputFormat with EMR Streaming #17

Open refaelos opened 7 years ago

refaelos commented 7 years ago

Hey,

@carlomedas

Since EMR Streaming is using the old format of FileInputFormat class (required the old mapred package name), we can't find a way to read the compressed files within the EMR Streaming steps.

Is there a wrapper to FourMzTextInputFormat using the older api?

refaelos commented 7 years ago

I managed to create the input format suitable for the old api (usable with EMR Streaming).

On our fork - https://github.com/soomla/4mc

carlomedas commented 7 years ago

OK please submit pull request if you like, if it's not impacting other existing formats, I'll merge it along.

refaelos commented 7 years ago

@carlomedas I can do the PR but I think that it needs to support Mc input format as well first. Don't you think?