apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 440 forks source link

Failed to read from Parquet File(Reason: No decoder to skip) #4011

Open leesf opened 11 months ago

leesf commented 11 months ago

Backend

VL (Velox)

Bug description

23/12/12 14:05:38 WARN TaskSetManager: Lost task 0.0 in stage 11.0 (TID 1522) (21.9.13.22 executor 17): io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: No decoder to skip
Retriable: False
Context: Split [Hive: oss://xxx/data/stat_date=20230820/event_type=page/06c19ce7-aafd-478f-9896-60ef470b22ab-0_301-9855-0_20231017103143661.parquet 0 - 118609468] Task Gluten [Stage: 11 TID: 1522]
Top-Level Context: Same as context.
Function: skip
File: ../../velox/dwio/parquet/reader/PageReader.cpp
Line: 753
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1 _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2 _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
# 3 0x0000000000000000
# 4 _ZN8facebook5velox7parquet18StringColumnReader4skipEm
# 5 _ZN8facebook5velox4dwio6common21SelectiveColumnReader6seekToEib
# 6 _ZN8facebook5velox7parquet15MapColumnReader4readEiN5folly5RangeIPKiEEPKm
# 7 _ZN8facebook5velox4dwio6common31SelectiveStructColumnReaderBase4readEiN5folly5RangeIPKiEEPKm
# 8 _ZN8facebook5velox4dwio6common31SelectiveStructColumnReaderBase4nextEmRSt10shared_ptrINS0_10BaseVectorEEPKNS2_8MutationE
# 9 _ZN8facebook5velox7parquet16ParquetRowReader4nextEmRSt10shared_ptrINS0_10BaseVectorEEPKNS0_4dwio6common8MutationE
# 10 _ZN8facebook5velox9connector4hive14HiveDataSource4nextEmRN5folly10SemiFutureINS4_4UnitEEE
# 11 _ZN8facebook5velox4exec9TableScan9getOutputEv
# 12 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 13 _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 14 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 15 _ZN6gluten24WholeStageResultIterator4nextEv
# 16 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 17 0x00007ff336469730

Spark version

Spark-3.3.x

Spark configurations

spark 3.3 with gluten configurations

System information

centos8

Relevant logs

No response

acvictor commented 9 months ago

Do you have any data to repro?