apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.65k stars 1.41k forks source link

Deprecate LZ4, introduce new LZ4_RAW #2606

Open asfimport opened 3 years ago

asfimport commented 3 years ago

The currently implemented LZ4 compression is based on the hadoop codec which is now deprecated (see details at PARQUET-1996). Also, a new, properly specified LZ4 compression (LZ4_RAW) has been introduced in the format.

The idea is to use the new LZ4_RAW compression for all cases when we currently use LZ4 and introduce a new configuration where the user would be able to switch to the deprecated behavior in case of the selected codec is LZ4.

For LZ4_RAW we will need to add the proper libs that will provide the officially specified LZ4 raw format.

Reporter: Gabor Szadovszky / @gszadovszky

Note: This issue was originally created as PARQUET-2032. Please see the migration documentation for further details.

asfimport commented 3 years ago

Antoine Pitrou / @pitrou: cc @emkornfield

asfimport commented 3 years ago

Antoine Pitrou / @pitrou: I presume this is for parquet-mr?

asfimport commented 3 years ago

Gabor Szadovszky / @gszadovszky: Yep, I've missed to add the correct component. Thanks for watching :)