facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.28k stars 1.08k forks source link

Accelerate ZSTD Compression for Dwrf writer with QAT #5960

Open yyang52 opened 11 months ago

yyang52 commented 11 months ago

Description

Currently, Velox implements its own Dwrf writer, which supports various compression codecs. ZSTD is the default compression method and is widely used in real workloads. While ZSTD has already shown high performance, it still has some room to improve. Intel® QuickAssist Technology (Intel® QAT) provides such capability to do ZSTD compression acceleration.

Intel QAT ZSTD plugin:

Intel® QuickAssist Technology (Intel® QAT) provides cryptographic and compression acceleration capabilities to improve performance and efficiency across the data center. QAT sequence producer will offload the process of producing block-level sequences of L1-L12 compression.

QAT ZSTD Plugin is a plugin to Zstandard(ZSTD) for accelerating compression by QAT. ZSTD provides block-level sequence producer API which allows users to register their custom sequence producer that libzstd invokes to process each block from 1.5.4.

Design & Analysis

We write some writer benchmarks/tests to check the hotspot and see optimization potentials.

Based on our previous tests, when writing Parquet, the top one hotspot becomes ZSTD_compress, which takes around 31% of total CPU time. Writing dwrf should be similar(we're conducting more tests on that), and should provide performance benefit for end2end workload.

kgpai commented 11 months ago

Thanks for the nice description and motivation for the change @yyang52 !

Yuhta commented 11 months ago

@zzhao0 Do we have QAT hardware internally? If yes we can try turn it on internally and see how much it can improve the writing efficiency.

Yuhta commented 11 months ago

@yyang52 How to check from command line if QAT is supported on a host?

klmckeig commented 11 months ago

@yyang52 How to check from command line if QAT is supported on a host?

Depends on the drivers. For in-tree drivers,

dmesg | grep qat
lspci | grep 494

For out of tree drivers,

/usr/local/bin/adf_ctl  status