Closed dbtsai closed 1 year ago
The Hadoop block compressor/decompressor interfaces are not supported, but the streaming interfaces are. The codecs in this project are designed for interacting with datalake file formats, and these either use the streaming interface or in the case of modern formats like ORC, Parquet and AVRO, they directly using compression algorithms, bypassing the Hadoop Codecs. We don't intend to add implementations of the Hadoop block apis, but I expect you could easily build them yourself using the underlying compression implementations in this project. Feel free, to fork/copy into the Hadoop code base.
We are trying to add LzoCodec to Apache Hadoop based on the implementation of aircompressor. https://github.com/apache/hadoop/pull/2159
When we try to integrate it into Hadoop, we get couple tests failures due to
java.lang.UnsupportedOperationException: LZO block compressor is not supported
. We find it's because in LzoCodec in aircompressor, we have astatic class HadoopLzoCompressor
that returns dummy implementation whengetCompressor
is called. Why don't we returnLzoCompressor
instead?