Open malhotrashivam opened 5 months ago
Some notes: Important PR : https://github.com/apache/parquet-mr/pull/1141/files#diff-b044ae9879a94e2b8a49d6e6911ea5498ef162df1373cc049ded6256980a7248
One interesting class they have now added are org.apache.parquet.conf.PlainParquetConfiguration
to replace org.apache.hadoop.conf.Configuration.
Some other interesting classes:
org.apache.parquet.hadoop.CodecFactory
which can potentially replace the usage of org.apache.hadoop.io.compress.CompressionCodecFactory
.org.apache.parquet.hadoop.CodecFactory.HeapBytesCompressor
which can replace org.apache.hadoop.io.compress.Compressor
.org.apache.parquet.hadoop.CodecFactory.HeapBytesDecompressor
which can replace org.apache.hadoop.io.compress.Decompressor
.
In the latest v2.0 release of parquet-mr (issue PARQUET-1822), they have added a number of wrapper classes which should allow users to use parquet-hadoop without depending on hadoop-common. We should work with these new wrappers to avoid the dependency in our code. Note that parquet-hadoop might still internally use hadoop-common though.
Found during #5469