Open asfimport opened 9 months ago
Fokko Driesprong / @Fokko: Can you double check if this is still the case with the latest Parquet release? I did some relevant work a while ago: https://github.com/apache/parquet-mr/pull/1074
Atour Mousavi Gourabi / @amousavigourabi: Hi Fokko, as far as I'm aware https://github.com/apache/parquet-mr/pull/1074 allows for not directly instantiating a Hadoop-based CompressionCodecFactory when reading, iff the user passes their own factory. Currently, however, we do not have any unhadooped CompressionCodecFactory implementations AFAIK (both CodecFactory and DirectCodecFactory will have to deal with a Hadoop CompressionCodec at some point). For the specific codecs, CompressionCodecName refers to 4 codecs from Hadoop itself, and 3 which are implemented in Parquet, but still implement both the Configurable and CompressionCodec interfaces from Hadoop. How I see it, this means the user would have to implement quite a bit of this themselves, which is a pretty big ask. If nobody minds, I'd like to work on this after https://github.com/apache/parquet-mr/pull/1141 is taken care of.
Currently the codecs implemented by Parquet implement the Hadoop Configurable and CompressionCodec interfaces. As part of the effort to decouple from Hadoop there need to be alternatives to these Hadoop implementations such that users are not forced to load Hadoop classes for this purpose at runtime.
Reporter: Atour Mousavi Gourabi / @amousavigourabi
Related issues:
Note: This issue was originally created as PARQUET-2353. Please see the migration documentation for further details.