airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java
Apache License 2.0
562 stars 111 forks source link

Lzop Codec in Apache Spark #131

Closed srinicodeit closed 1 year ago

srinicodeit commented 3 years ago

Is there any documentation to use LzopCodec codec in apache-spark ?

dain commented 1 year ago

You will need to create a subclass of the codec using the actual name hadoop used for the codec, because Hadoop, unfortunately, encodes the class name into the file formats. Here is how we did this in a Trino test: https://github.com/trinodb/trino/blob/master/lib/trino-hive-formats/src/test/java/com/hadoop/compression/lzo/LzopCodec.java#L18