Open qqibrow opened 10 months ago
just found that reducing Max number of rows to read
from 1000 to 500 can mitigate the error:
while (rowReader->next(1000, result)) { // change 1000 to 500 can mitigate the error
for (vector_size_t i = 0; i < result->size(); i++) {
std::cout << result->toString(i) << std::endl;
}
}
@qqibrow Will you be able to create a draft PR to add this test in ParquetReaderTest? You can mark it as disabled and add a comment mentioning this issue until the bug is fixed. Is this urgent? If not I think it's better to wait for my refactor in which the null handling would be simplified.
Just FYI, I couldn't repro the issue:
makagonov@makagonov-xps:~/presto/presto-native-execution$ cmake-build-debug/velox/velox/dwio/parquet/tests/reader/velox_scan_parquet ~/Downloads/issue7617.parquet ./out.txt
number of rows: 30000
velox type: ROW<test:ARRAY<INTEGER>>
@makagonov did you change the batch size to 1000
?
@makagonov build with #7642 . run the command:
/home/lniu/code/velox_new/velox/_build/debug/velox/dwio/parquet/tests/reader/velox_dwio_print_parquet --file_path=/home/lniu/issue7617.parquet --batch_size=1000
will reproduce the issue. I just tried. default batch size (500) works fine and cannot surface the issue.
@qqibrow Is this still not fixed?
Bug description
Expected behavior:
able to read the parquet file with array type that contains 30000 empty arrays. Both parquet-tools and presto parquet reader are able to read the file
Actual behavior: velox native parquet reader crashed. Not sure it's a tuning issue or bug.
System information
Velox System Info v0.0.2 Commit: 1e186e548833750cdee4b95d829711ddad78aba1 CMake Version: 3.16.3 System: Linux-5.4.0-1063-aws Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs