apache / orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
https://orc.apache.org/
Apache License 2.0
689 stars 483 forks source link

ORC-817, ORC-1088: Support ZStandard compression using zstd-jni #1743

Closed cxzl25 closed 9 months ago

cxzl25 commented 10 months ago

What changes were proposed in this pull request?

Original PR: https://github.com/apache/orc/pull/988 Original author: @dchristle

This PR will support the use of zstd-jni library as the implementation of ORC zstd, with better performance than aircompressor. (https://github.com/apache/orc/pull/988#issuecomment-1884443205)

This PR also exposes the compression level and "long mode" settings to ORC users. These settings allow the user to select different speed/compression trade-offs that were not supported by the original aircompressor.

Why are the changes needed?

These change makes sense for a few reasons:

ORC users will gain all the improvements from the main zstd library. It is under active development and receives regular speed and compression improvements. In contrast, aircompressor's zstd implementation is older and stale.

ORC users will be able to use the entire speed/compression tradeoff space. Today, aircompressor's implementation has only one of eight compression strategies (link). This means only a small range of faster but less compressive strategies can be exposed to ORC users. ORC storage with high compression (e.g. for large-but-infrequently-used data) is a clear use case that this PR would unlock.

It will harmonize the Java ORC implementation with other projects in the Hadoop ecosystem. Parquet, Spark, and even the C++ ORC reader/writers all rely on the official zstd implementation either via zstd-jni or directly. In this way, the Java reader/writer code is an outlier.

Detection and fixing any bugs or regressions will generally happen much faster, given the larger number of users and active developer community of zstd and zstd-jni.

The largest tradeoff is that zstd-jni wraps compiled code. That said, many microprocessor architectures are already targeted & bundled into zstd-jni, so this should be a rare hurdle.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun commented 9 months ago

I changed the default level to 1 and compared quickly with the generate benchmark. Level 1 is still smaller.

JAVA (Aircompressor)

$ java -Dorc.compression.zstd.impl=java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -f orc -d sales -s 100000
$ ls -alR data | tail -n3
-rw-r--r--  1 dongjoon  staff  10746324 Jan 16 14:23 orc.gz
-rw-r--r--  1 dongjoon  staff  12133885 Jan 16 14:23 orc.snappy
-rw-r--r--  1 dongjoon  staff  10642346 Jan 16 14:23 orc.zstd

ZSTD-JNI

$ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -f orc -d sales -s 100000
$ ls -alR data | tail -n3
-rw-r--r--  1 dongjoon  staff  10746324 Jan 16 14:23 orc.gz
-rw-r--r--  1 dongjoon  staff  12133885 Jan 16 14:23 orc.snappy
-rw-r--r--  1 dongjoon  staff  10543260 Jan 16 14:23 orc.zstd
dongjoon-hyun commented 9 months ago

Thank you, @cxzl25 and all!

cxzl25 commented 9 months ago

Thanks for all the help!

Migrating from zlib to zstd, a table has a compression rate of 35% through aircompressor. By adjusting some parameters of zstd-jni, a compression rate of 44% is achieved.

dongjoon-hyun commented 9 months ago

My bad. It seems that I made a regression at Taxi data compression.

ORC 1.9

data/generated//taxi:
total 2196176
drwxr-xr-x  5 dongjoon  staff   160B Jan 17 08:02 .
drwxr-xr-x  5 dongjoon  staff   160B Jan 17 08:07 ..
-rw-r--r--  1 dongjoon  staff   299M Jan 17 08:03 orc.zstd

ORC 2.0

-rw-r--r--  1 dongjoon  staff   334M Jan 17 07:56 orc.zstd (level 1)
-rw-r--r--  1 dongjoon  staff   299M Jan 17 08:16 orc.zstd (level 3)
-rw-r--r--  1 dongjoon  staff   302M Jan 17 08:21 orc.zstd (level 4)
-rw-r--r--  1 dongjoon  staff   300M Jan 17 08:27 orc.zstd (level 5)

ZStd compression level looks inconsistent with this dataset and let me change the zstd level change back to 3 like the original proposal.