Open yaqi-zhao opened 1 year ago
Cc: @oerling @Yuhta
Hi, @Yuhta @oerling, we have completed the development of the solution as description. Do you think it is reasonable to submit a PR for your review?
Regarding checking "whether IAA can decompress the data", the logic is like that: step-1, detect whether the system includes usable IAA Hardware? If yes, go to step-2; step-2, detect whether the compression format is zlib? If yes, go to step-3; step-3, read the window size info from zlib header, and detect whether windows size is <= 4KB? If yes, go to IAA HW decompress;
Otherwise, fallback to the Velox SW path to decompress the data.
Hi @yaqi-zhao , be free add your comments if the above description is inaccurate. Thanks!
@george-gu-2021 The zlib header and history buffer are checked in one step. We just read the compressed data header to check if it is Zlib header format and then read window size form the header. The other logic you mentioned is right.
I am curious how you implement readWithVisitor
. Ideally we should not duplicate any of the logic already in PageReader
.
Hi, @Yuhta Actually to avoid affect the current PageReader
file, the newly added IAAPageReader
has some duplicate code. readWithVisitor
of IAAPageReader
is the same logic as PageReader
. You can see the implementation at https://github.com/facebookincubator/velox/pull/6176/files#diff-3155c9fa38e02bffd7c4bf31b1eed7a39e2f12efdae4b917817eabb388cc9221.
Maybe I can move the logic of IAAPageReader
to PageReader
if it is not allowed to duplicate the logic of PageReader
.
@yaqi-zhao Yes if you can implement the logic inside PageReader
(with a few conditional includes) that would be best.
@yaqi-zhao Also I would recommend rebase your work on #5914 , so that uncompressQplData
can be added to common compression code (maybe with a better async interface)
Hi, @Yuhta. I rebase my work on #5914. IAAPageReader
is deleted and the new logic is implemented inside the PageReader
. The change is updated at the #6176 .
Description
The Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator that provides very high throughput compression and decompression combined with primitive analytic functions. It is available in the newest generation of Intel® Xeon® Scalable processors ("Sapphire Rapids"). We can offload the GZip (window size is 4KB) decompression to the IAA hardware and save the CPU bandwidth. Here is a description of how to offload the GZip decompression to the IAA hardware. We use IAA to do pre-decompress work in Velox native parquet reader and the get a good performance in the TPCH alike Benchmark. The Velox early test data are as follows (the result is preliminary and varies in different environments, just for reference rather than commitment):
About 2X performance gain compared with the current gzip(window size is 4KB) solution About 20~40% performance gain compared with ZSTD compressed file About 5~10% performance compared with snappy compressed file.
Implementation Brief
To make minimize the effection to the current code, we add a new IAAPageReader class. In the ParquetReader, it will check if the IAA can be used to decompress the data. If is true, then the ParquetReader will use IAAPageReader to to the next work. The criteria of the checking in ParquetReader is:
Here is a flow chart of the code change: ![Uploading image.png…]()