apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 438 forks source link

[VL] Failures encountered in using S3a filesystem with latest 1.1.1 released jar #5963

Open deepashreeraghu opened 5 months ago

deepashreeraghu commented 5 months ago

Backend

VL (Velox)

Bug description

[Expected behavior] - IT should honor s3a filesystem and be able to access files. [actual behavior] - It fails with the below error :

Reason: No registered file system matched with file path 's3a://perf-data-chstest1/

I am using the jar released at - https://github.com/apache/incubator-gluten/releases/download/v1.1.1/gluten-velox-bundle-spark3.3_2.12-1.1.1.jar

Spark version

Spark-3.3.x

Spark configurations

        "spark.hadoop.fs.s3a.endpoint": "s3.direct.jp-tok.cloud-object-storage.appdomain.cloud",
         "spark.hadoop.fs.s3a.access.key": "XXX",
         "spark.hadoop.fs.s3a.secret.key": "XXX",
         "spark.plugins": "io.glutenproject.GlutenPlugin",
        "spark.memory.offHeap.enabled": "true",
        "spark.memory.offHeap.size": "20g",
        "spark.shuffle.manager": "org.apache.spark.shuffle.sort.ColumnarShuffleManager"

System information

No response

Relevant logs

Reason: No registered file system matched with file path 's3a://perf-data-chstest1/rasika_data/parquet_db/customer/20230907_030906_00076_8uifi_60b91066-caf5-4356-abe1-b9fdab
c81a4a'
Retriable: False
Context: Split [Hive: s3a://perf-data-chstest1/rasika_data/parquet_db/customer/20230907_030906_00076_8uifi_60b91066-caf5-4356-abe1-b9fdabc81a4a 0 - 134217728] Task Gluten_St
age_1_TID_1
Top-Level Context: Same as context.
Function: getFileSystem
File: /root/src/oap-project/gluten/ep/build-velox/build/velox_ep/velox/common/file/FileSystems.cpp
Line: 61
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox11filesystems13getFileSystemESt17basic_string_viewIcSt11char_traitsIcEESt10shared_ptrIKNS0_6ConfigEE
# 4  _ZN8facebook5velox19FileHandleGeneratorclERKSs
# 5  _ZN8facebook5velox13CachedFactoryISsSt10shared_ptrINS0_10FileHandleEENS0_19FileHandleGeneratorEE8generateERKSs
# 6  _ZN8facebook5velox9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsE
# 7  _ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 8  _ZN8facebook5velox4exec9TableScan9getOutputEv
# 9  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 10 _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
# 11 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 12 _ZN6gluten24WholeStageResultIterator4nextEv
# 13 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 14 ffi_call_unix64
# 15 ffi_call_int
# 16 _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread
# 17 bytecodeLoopCompressed
# 18 0x00000000001a3a72

at io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at io.glutenproject.utils.InvocationFlowProtection.hasNext(Iterators.scala:135)
        at io.glutenproject.utils.IteratorCompleter.hasNext(Iterators.scala:69)
        at io.glutenproject.utils.PayloadCloser.hasNext(Iterators.scala:35)
        at io.glutenproject.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:98)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator.isEmpty(Iterator.scala:387)
        at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
        at org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
        at io.glutenproject.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:116)
        at io.glutenproject.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:80)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:839)
Caused by: java.lang.RuntimeException: Exception: VeloxRuntimeError
Yohahaha commented 5 months ago

seems the released jar does not compiled with S3 enable.

PHILO-HE commented 5 months ago

@deepashreeraghu, the released jar may not contain S3 support (cc @weiting-chen). You can build gluten with S3 enabled and then test again.

weiting-chen commented 5 months ago

The S3 parameter is disabled by default and the release jar hasn't enable it as well. If you wish to contain S3 support, please compile Gluten jar via source code and turn on below parameter. https://github.com/apache/incubator-gluten/blob/main/dev/builddeps-veloxbe.sh#L23 After you compile the jar with S3 support, you can follow the guide to get the s3 support. https://gluten.apache.org/docs/velox/s3#working-with-s3

deepashreeraghu commented 5 months ago

Sure, thank you. will try building it locally.