airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java
Apache License 2.0
549 stars 110 forks source link

HadoopLzoCompressor vs LzoCompressor? #115

Closed dbtsai closed 1 year ago

dbtsai commented 3 years ago

We are trying to add LzoCodec to Apache Hadoop based on the implementation of aircompressor. https://github.com/apache/hadoop/pull/2159

When we try to integrate it into Hadoop, we get couple tests failures due to java.lang.UnsupportedOperationException: LZO block compressor is not supported. We find it's because in LzoCodec in aircompressor, we have a static class HadoopLzoCompressor that returns dummy implementation when getCompressor is called. Why don't we return LzoCompressor instead?

dain commented 1 year ago

The Hadoop block compressor/decompressor interfaces are not supported, but the streaming interfaces are. The codecs in this project are designed for interacting with datalake file formats, and these either use the streaming interface or in the case of modern formats like ORC, Parquet and AVRO, they directly using compression algorithms, bypassing the Hadoop Codecs. We don't intend to add implementations of the Hadoop block apis, but I expect you could easily build them yourself using the underlying compression implementations in this project. Feel free, to fork/copy into the Hadoop code base.