facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.46k stars 1.13k forks source link

Failed to read the parquet file #10395

Closed weixiuli closed 2 months ago

weixiuli commented 3 months ago

Problem description

io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: ( vs. )
Retriable: False
Expression: header == strings + numBytes
Context: Split [Hive: oss://•d_di/dtm=20240623/part-00994-c947690b-d0a8-4bc9-8a30-f4c45b41963b.c000.zstd.parquet 0 - 17091072] Task Gluten_Stage_1_TID_7
Top-Level Context: Same as context.
Function: prepareDictionary
File: /root/zhongqing/git/gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/PageReader.cpp
Line: 425
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox7parquet10PageReader17prepareDictionaryERKNS1_6thrift10PageHeaderE
# 4  _ZN8facebook5velox7parquet10PageReader10seekToPageEl
# 5  _ZN8facebook5velox7parquet10PageReader11rowsForPageERNS0_4dwio6common21SelectiveColumnReaderEbbRN5folly5RangeIPKiEERPKm

System information

Velox System Info v0.0.2 Commit: adc5219b5f720cf2d95616f480fa3ea0934da128 CMake Version: 3.20.2 System: Linux-5.4.119-19-0009.11 Arch: x86_64 C++ Compiler: /opt/rh/gcc-toolset-9/root/usr/bin/c++ C++ Compiler Version: 9.2.1 C Compiler: /opt/rh/gcc-toolset-9/root/usr/bin/cc C Compiler Version: 9.2.1 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

CMake log

No response

yingsu00 commented 3 months ago

@weixiuli Will you be able to attach a Parquet file that can reproduce this error? Thanks!

weixiuli commented 2 months ago

@weixiuli Will you be able to attach a Parquet file that can reproduce this error? Thanks!

The PR https://github.com/facebookincubator/velox/pull/9223 may can fix this issue, i have cherry-pick it.

majetideepak commented 2 months ago

Please re-open if https://github.com/facebookincubator/velox/pull/9223 does not fix this.