facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.41k stars 1.11k forks source link

Do pre-decompress gzip with IAA hardware #5718

Open yaqi-zhao opened 1 year ago

yaqi-zhao commented 1 year ago

Description

The Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator that provides very high throughput compression and decompression combined with primitive analytic functions. It is available in the newest generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). We can offload the GZip (window size is 4KB) decompression to the IAA hardware and save the CPU bandwidth. Here is a description of how to offload the GZip decompression to the IAA hardware. image We use IAA to do pre-decompress work in Velox native parquet reader and the get a good performance in the TPCH alike Benchmark. The Velox early test data are as follows (the result is preliminary and varies in different environments, just for reference rather than commitment):

About 2X performance gain compared with the current gzip(window size is 4KB) solution About 20~40% performance gain compared with ZSTD compressed file About 5~10% performance compared with snappy compressed file.

Implementation Brief

To make minimize the effection to the current code, we add a new IAAPageReader class. In the ParquetReader, it will check if the IAA can be used to decompress the data. If is true, then the ParquetReader will use IAAPageReader to to the next work. The criteria of the checking in ParquetReader is:

Here is a flow chart of the code change: ![Uploading image.png…]()

image

pedroerp commented 1 year ago

Cc: @oerling @Yuhta

yaqi-zhao commented 1 year ago

Hi, @Yuhta @oerling, we have completed the development of the solution as description. Do you think it is reasonable to submit a PR for your review?

george-gu-2021 commented 1 year ago

Regarding checking "whether IAA can decompress the data", the logic is like that: step-1, detect whether the system includes usable IAA Hardware? If yes, go to step-2; step-2, detect whether the compression format is zlib? If yes, go to step-3; step-3, read the window size info from zlib header, and detect whether windows size is <= 4KB? If yes, go to IAA HW decompress;

Otherwise, fallback to the Velox SW path to decompress the data.

Hi @yaqi-zhao , be free add your comments if the above description is inaccurate. Thanks!

yaqi-zhao commented 1 year ago

@george-gu-2021 The zlib header and history buffer are checked in one step. We just read the compressed data header to check if it is Zlib header format and then read window size form the header. The other logic you mentioned is right.

Yuhta commented 1 year ago

I am curious how you implement readWithVisitor. Ideally we should not duplicate any of the logic already in PageReader.

yaqi-zhao commented 1 year ago

Hi, @Yuhta Actually to avoid affect the current PageReader file, the newly added IAAPageReader has some duplicate code. readWithVisitor of IAAPageReader is the same logic as PageReader. You can see the implementation at https://github.com/facebookincubator/velox/pull/6176/files#diff-3155c9fa38e02bffd7c4bf31b1eed7a39e2f12efdae4b917817eabb388cc9221. Maybe I can move the logic of IAAPageReader to PageReader if it is not allowed to duplicate the logic of PageReader.

Yuhta commented 1 year ago

@yaqi-zhao Yes if you can implement the logic inside PageReader (with a few conditional includes) that would be best.

Yuhta commented 1 year ago

@yaqi-zhao Also I would recommend rebase your work on #5914 , so that uncompressQplData can be added to common compression code (maybe with a better async interface)

yaqi-zhao commented 1 year ago

Hi, @Yuhta. I rebase my work on #5914. IAAPageReader is deleted and the new logic is implemented inside the PageReader. The change is updated at the #6176 .