fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 38 forks source link

How to use 4mc in Hive? #51

Closed WindBrush closed 2 years ago

WindBrush commented 3 years ago

I wanna create a hive table on a compressed file. when reading this file, it will give it to multiple mappers rather than 1 only if the compressed file is splittable. How to do this with 4mc? Only to change the mapreduce api into old ones which is supported in Hive?

WindBrush commented 2 years ago

for anyone who may have the same problem. the reason why it can't be used in hive is that hive use old hadoop api (mapred.) while 4mc use new hadoop api (mapreduce.),but this problem only matters in the InputFormat module. Compression module is irrelevant, so you can use 4mc in hive with rewriting InputFormat by old api.