apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 437 forks source link

[VL] TableScan may report an error when using filecache. #7805

Open aqluheng opened 2 weeks ago

aqluheng commented 2 weeks ago

Backend

VL (Velox)

Bug description

When using Gluten-1.2.0, an error is reported after a filecache hit once filecache is set.

spark.gluten.sql.columnar.backend.velox.fileHandleCacheEnabled  true
spark.gluten.sql.columnar.backend.velox.cacheEnabled            true
spark.gluten.sql.columnar.backend.velox.memCacheSize            2500000000
spark.gluten.sql.columnar.backend.velox.ssdCacheSize            0

TPCDS32 q9 will report this error.

24/11/04 20:45:08 WARN TaskSetManager: Lost task 91.0 in stage 6.0 (TID 12659) (node3 executor 20): org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Error during calling Java code from native code: org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: No magic bytes found at end of the Parquet file
Retriable: False
Expression: strncmp(copy.data() + readSize - 4, "PAR1", 4) == 0
Additional Context: Operator: TableScan[0] 0
Function: loadFileMetaData
File: /home/luheng/gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp
Line: 181
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox7parquet10ReaderBase16loadFileMetaDataEv
# 4  _ZN8facebook5velox7parquet10ReaderBaseC1ESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
# 5  _ZN8facebook5velox7parquet13ParquetReaderC2ESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
# 6  _ZN8facebook5velox7parquet20ParquetReaderFactory12createReaderESt10unique_ptrINS0_4dwio6common13BufferedInputESt14default_deleteIS6_EERKNS5_13ReaderOptionsE
# 7  _ZN8facebook5velox9connector4hive11SplitReader12createReaderESt10shared_ptrINS0_6common14MetadataFilterEERKS4_INS2_16HiveColumnHandleEE
# 8  _ZN8facebook5velox9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsERKS4_INS2_16HiveColumnHandleEE
# 9  _ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 10 _ZN8facebook5velox4exec9TableScan9getOutputEv
# 11 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 12 _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 13 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 14 _ZN6gluten24WholeStageResultIterator4nextEv
# 15 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 16 0x0000ffff9cc677bc

Spark version

3.3.1

FelixYBW commented 2 weeks ago

Filecache actually doesn't work in Gluten/Velox. There is no UT to track it.

Zand100 commented 2 weeks ago

Hi, I would like to work on adding filecache to Gluten, to learn more about Gluten. Do you think that's feasible for someone new to the project? How would you recommend me to get started, please?

FelixYBW commented 2 weeks ago

@Zand100 I don't think it's a good start. The part is too complex. To fully enable file cache, we need much modification to Velox, even need to refactor the cache code.