facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.29k stars 1.09k forks source link

Add comprehend parquet writer test #5796

Open kewang1024 opened 11 months ago

kewang1024 commented 11 months ago

Description

As of now, there is no test on parquet writer

kewang1024 commented 11 months ago

cc @majetideepak

ashokku2022 commented 10 months ago

@yiweiHeOSS would be working on it, thanks!

majetideepak commented 10 months ago

@yiweiHeOSS the scope is to create something like https://github.com/facebookincubator/velox/blob/main/velox/dwio/dwrf/test/WriterTests.cpp for Parquet. Let's start small by creating the test class and testing the compression types. Will need to fix https://github.com/facebookincubator/velox/issues/5865 as well.

majetideepak commented 10 months ago

The new Writer tests will reside in velox/dwio/parquet/tests/writer/ We need to test all the ParquetWriter Options currently supported. They are listed here: https://github.com/facebookincubator/velox/blob/main/velox/dwio/parquet/writer/Writer.h#L84 compression and flushPolicyFactory are the more important options. We should also confirm that the Presto Parquet options are being honored.

majetideepak commented 10 months ago

Here is the Presto Writer Config https://github.com/prestodb/presto/blob/852bbfafeefc51b8476deaa8d5aee0c8a29bee57/presto-hive/src/main/java/com/facebook/presto/hive/ParquetFileWriterConfig.java

ethanyzhang commented 2 months ago

Should be closed by https://github.com/facebookincubator/velox/pull/7332 @czentgr

czentgr commented 2 months ago

I suppose one thing missing from this is tests around the flush policy (flushPolicyFactory). The main test added was around the compression options.

ethanyzhang commented 2 months ago

@czentgr can you open a new issue for the missing test?

czentgr commented 2 months ago

Created https://github.com/facebookincubator/velox/issues/9496 and https://github.com/facebookincubator/velox/issues/9499 to cover two of the other options as beginner issues that are relevant.