apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 439 forks source link

[VL] 1.2.0 on Ubuntu 22.04: simple queries on ORC files hang forever (S3 *and* GCS) #7200

Open acruise opened 2 months ago

acruise commented 2 months ago

Backend

VL (Velox)

Bug description

I have a TPC-DS dataset in ORC format, duplicated on S3 and GCS. On vanilla Spark 3.5.1 on a single node, these queries complete in 1-3 seconds:

S3:

val customers = spark.read.orc("s3a://my-s3-bucket/datasets/tpcds_sf100.orc/customer/*.orc").toDF
customers.count()

GCS:

val customers = spark.read.orc("gs://my-gs-bucket/datasets/tpcds_sf100.orc/customer/*.orc").toDF
customers.count()

With Gluten enabled, initializing the DataFrame is fine, but when I invoke count() the expected number of tasks is spawned, but they do nothing at all. I've tried Gluten builds from the 1.2.0 tag, as well as from 73100f49a7.

I've tried disabling whole-stage codegen, but it makes no difference.

Spark version

Spark-3.5.1

Spark configurations

For S3:

/opt/spark/bin/spark-shell \
    --jars /home/alex/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_x86_64-1.2.0.jar \
    --packages org.apache.hadoop:hadoop-aws:3.3.4 \
    -c spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain \
    -c spark.plugins=org.apache.gluten.GlutenPlugin \
    -c spark.memory.offHeap.enabled=true \
    -c spark.memory.offHeap.size=32G \
    -c spark.driver.memory=8g \
    -c spark.executor.memory=16g

For GCS:

/opt/spark/bin/spark-shell \
    --jars /home/alex/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_x86_64-1.3.0-SNAPSHOT.jar,/home/alex/gcs-connector-3.0.2-shaded.jar \
    -c spark.plugins=org.apache.gluten.GlutenPlugin \
    -c spark.memory.offHeap.enabled=true \
    -c spark.memory.offHeap.size=32G \
    -c spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem \
    -c spark.hadoop.fs.gs.auth.type=SERVICE_ACCOUNT_JSON_KEYFILE \
    -c spark.hadoop.fs.gs.auth.service.account.json.keyfile=/home/alex/gcp.key

System information

It's a c5a.8xlarge (64GB, 32 cores, >100G local disk), running Ubuntu 22.04 with nearly all distro updates applied.

$ dev/info.sh 

Velox System Info v0.0.2
Commit: 73100f49a705f838a3e3de655c265d8c83809fde
CMake Version: 3.28.3
System: Linux-6.5.0-1022-aws
Arch: x86_64
CPU Name: Model name:                         AMD EPYC 7R32
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.10/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

\nThe results will be copied to your clipboard if xclip is installed.

Relevant logs

S3:

scala> customers.count
24/09/11 22:36:32 INFO FileSourceStrategy: Pushed Filters: 
24/09/11 22:36:32 INFO FileSourceStrategy: Post-Scan Filters: 
24/09/11 22:36:32 INFO GlutenFallbackReporter: Validation failed for plan: Exchange[QueryId=0], due to: [FallbackByBackendSettings] Validation failed on node Exchange.
24/09/11 22:36:32 INFO InputPartitionsUtil: Planning scan with bin packing, max size: 6213148 bytes, open cost is considered as scanning 4194304 bytes.
24/09/11 22:36:32 INFO DAGScheduler: Registering RDD 5 (count at <console>:27) as input to shuffle 0
24/09/11 22:36:32 INFO DAGScheduler: Got map stage job 1 (count at <console>:27) with 22 output partitions
24/09/11 22:36:32 INFO DAGScheduler: Final stage: ShuffleMapStage 1 (count at <console>:27)
24/09/11 22:36:32 INFO DAGScheduler: Parents of final stage: List()
24/09/11 22:36:32 INFO DAGScheduler: Missing parents: List()
24/09/11 22:36:32 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at count at <console>:27), which has no missing parents
24/09/11 22:36:32 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 32.9 KiB, free 36.6 GiB)
24/09/11 22:36:32 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 14.6 KiB, free 36.6 GiB)
24/09/11 22:36:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-0-251.us-west-1.compute.internal:38299 (size: 14.6 KiB, free: 36.6 GiB)
24/09/11 22:36:32 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1585
24/09/11 22:36:32 INFO DAGScheduler: Submitting 22 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at count at <console>:27) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
24/09/11 22:36:32 INFO TaskSchedulerImpl: Adding task set 1.0 with 22 tasks resource profile 0
24/09/11 22:36:32 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 9202 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 1, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 2, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 3, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 4, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 5, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 7) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 6, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 8) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 7, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID 9) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 8, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 10) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 9, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 10.0 in stage 1.0 (TID 11) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 10, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 11.0 in stage 1.0 (TID 12) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 11, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 12.0 in stage 1.0 (TID 13) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 12, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 13.0 in stage 1.0 (TID 14) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 13, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 14.0 in stage 1.0 (TID 15) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 14, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 15.0 in stage 1.0 (TID 16) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 15, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 16.0 in stage 1.0 (TID 17) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 16, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 17.0 in stage 1.0 (TID 18) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 17, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 18.0 in stage 1.0 (TID 19) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 18, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 19.0 in stage 1.0 (TID 20) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 19, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 20.0 in stage 1.0 (TID 21) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 20, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO TaskSetManager: Starting task 21.0 in stage 1.0 (TID 22) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 21, PROCESS_LOCAL, 9204 bytes) 
24/09/11 22:36:32 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
24/09/11 22:36:32 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
24/09/11 22:36:32 INFO Executor: Running task 2.0 in stage 1.0 (TID 3)
24/09/11 22:36:32 INFO Executor: Running task 3.0 in stage 1.0 (TID 4)
24/09/11 22:36:32 INFO Executor: Running task 4.0 in stage 1.0 (TID 5)
24/09/11 22:36:32 INFO Executor: Running task 5.0 in stage 1.0 (TID 6)
24/09/11 22:36:32 INFO Executor: Running task 6.0 in stage 1.0 (TID 7)
24/09/11 22:36:32 INFO Executor: Running task 7.0 in stage 1.0 (TID 8)
24/09/11 22:36:32 INFO Executor: Running task 8.0 in stage 1.0 (TID 9)
24/09/11 22:36:32 INFO Executor: Running task 9.0 in stage 1.0 (TID 10)
24/09/11 22:36:32 INFO Executor: Running task 10.0 in stage 1.0 (TID 11)
24/09/11 22:36:32 INFO Executor: Running task 11.0 in stage 1.0 (TID 12)
24/09/11 22:36:32 INFO Executor: Running task 12.0 in stage 1.0 (TID 13)
24/09/11 22:36:32 INFO Executor: Running task 13.0 in stage 1.0 (TID 14)
24/09/11 22:36:32 INFO Executor: Running task 14.0 in stage 1.0 (TID 15)
24/09/11 22:36:32 INFO Executor: Running task 15.0 in stage 1.0 (TID 16)
24/09/11 22:36:32 INFO Executor: Running task 16.0 in stage 1.0 (TID 17)
24/09/11 22:36:32 INFO Executor: Running task 17.0 in stage 1.0 (TID 18)
24/09/11 22:36:32 INFO Executor: Running task 18.0 in stage 1.0 (TID 19)
24/09/11 22:36:32 INFO Executor: Running task 19.0 in stage 1.0 (TID 20)
24/09/11 22:36:32 INFO Executor: Running task 20.0 in stage 1.0 (TID 21)
24/09/11 22:36:32 INFO Executor: Running task 21.0 in stage 1.0 (TID 22)
[Stage 0:>                  (0 + 1) / 1][Stage 1:>                (0 + 22) / 22]

GCS:

scala> customers.count
24/09/12 18:17:39 INFO FileSourceStrategy: Pushed Filters: 
24/09/12 18:17:39 INFO FileSourceStrategy: Post-Scan Filters: 
24/09/12 18:17:39 WARN GlutenFallbackReporter: Validation failed for plan: Exchange[QueryId=0], due to: [FallbackByBackendSettings] Validation failed on node Exchange.
24/09/12 18:17:40 INFO InputPartitionsUtil: Planning scan with bin packing, max size: 6213148 bytes, open cost is considered as scanning 4194304 bytes.
24/09/12 18:17:40 INFO DAGScheduler: Registering RDD 2 (count at <console>:24) as input to shuffle 0
24/09/12 18:17:40 INFO DAGScheduler: Got map stage job 0 (count at <console>:24) with 22 output partitions
24/09/12 18:17:40 INFO DAGScheduler: Final stage: ShuffleMapStage 0 (count at <console>:24)
24/09/12 18:17:40 INFO DAGScheduler: Parents of final stage: List()
24/09/12 18:17:40 INFO DAGScheduler: Missing parents: List()
24/09/12 18:17:40 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[2] at count at <console>:24), which has no missing parents
24/09/12 18:17:40 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.6 KiB, free 32.4 GiB)
24/09/12 18:17:40 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.7 KiB, free 32.4 GiB)
24/09/12 18:17:40 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-0-251.us-west-1.compute.internal:36357 (size: 14.7 KiB, free: 32.4 GiB)
24/09/12 18:17:40 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1585
24/09/12 18:17:40 INFO DAGScheduler: Submitting 22 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[2] at count at <console>:24) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
24/09/12 18:17:40 INFO TaskSchedulerImpl: Adding task set 0.0 with 22 tasks resource profile 0
24/09/12 18:17:40 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 8892 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 1, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 2, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 3, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 4, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 5, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 6, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 7, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 8, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 9, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 10, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 11, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 12, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 13, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 14, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 15, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 16, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 17, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 18, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 19, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 20, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 21, PROCESS_LOCAL, 8894 bytes) 
24/09/12 18:17:40 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
24/09/12 18:17:40 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
24/09/12 18:17:40 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
24/09/12 18:17:40 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
24/09/12 18:17:40 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
24/09/12 18:17:40 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
24/09/12 18:17:40 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
24/09/12 18:17:40 INFO Executor: Running task 8.0 in stage 0.0 (TID 8)
24/09/12 18:17:40 INFO Executor: Running task 10.0 in stage 0.0 (TID 10)
24/09/12 18:17:40 INFO Executor: Running task 9.0 in stage 0.0 (TID 9)
24/09/12 18:17:40 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
24/09/12 18:17:40 INFO Executor: Running task 11.0 in stage 0.0 (TID 11)
24/09/12 18:17:40 INFO Executor: Running task 13.0 in stage 0.0 (TID 13)
24/09/12 18:17:40 INFO Executor: Running task 12.0 in stage 0.0 (TID 12)
24/09/12 18:17:40 INFO Executor: Running task 14.0 in stage 0.0 (TID 14)
24/09/12 18:17:40 INFO Executor: Running task 15.0 in stage 0.0 (TID 15)
24/09/12 18:17:40 INFO Executor: Running task 16.0 in stage 0.0 (TID 16)
24/09/12 18:17:40 INFO Executor: Running task 17.0 in stage 0.0 (TID 17)
24/09/12 18:17:40 INFO Executor: Running task 18.0 in stage 0.0 (TID 18)
24/09/12 18:17:40 INFO Executor: Running task 19.0 in stage 0.0 (TID 19)
24/09/12 18:17:40 INFO Executor: Running task 20.0 in stage 0.0 (TID 20)
24/09/12 18:17:40 INFO Executor: Running task 21.0 in stage 0.0 (TID 21)
W0912 18:17:40.602993 411440 GCSFileSystem.cpp:303] Config::gcsCredentials is empty
[Stage 0:>                                                        (0 + 22) / 22]
acruise commented 2 months ago

I rebuilt Gluten from HEAD of main as of ~1h ago and get the same result; here's my command line:

/opt/spark/bin/spark-shell \
    --jars ~/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_x86_64-1.3.0-SNAPSHOT.jar \
    --packages org.apache.hadoop:hadoop-aws:3.3.4 \
    -c spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain \
    -c spark.plugins=org.apache.gluten.GlutenPlugin \
    -c spark.memory.offHeap.enabled=true \
    -c spark.memory.offHeap.size=32G

edit: after a few minutes I got this log:

24/09/12 00:05:42 WARN GlutenFallbackReporter: Validation failed for plan: Exchange[QueryId=0], due to: [FallbackByBackendSettings] Validation failed on node Exchange.
acruise commented 2 months ago

Maybe unrelated: when I Ctrl-C the spark shell, I get this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007408d6f8555d, pid=325442, tid=325442
#
# JRE version: OpenJDK Runtime Environment (17.0.12+7) (build 17.0.12+7-Ubuntu-1ubuntu222.04)
# Java VM: OpenJDK 64-Bit Server VM (17.0.12+7-Ubuntu-1ubuntu222.04, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libvelox.so+0x558555d]  Aws::Http::CleanupHttp()+0x4d
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/alex/incubator-gluten/core.325442)
#
# An error report file with more information is saved as:
# /home/alex/incubator-gluten/hs_err_pid325442.log
#
# If you would like to submit a bug report, please visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-17
#

My JVM is:

$ java -version
openjdk version "17.0.12" 2024-07-16
OpenJDK Runtime Environment (build 17.0.12+7-Ubuntu-1ubuntu222.04)
OpenJDK 64-Bit Server VM (build 17.0.12+7-Ubuntu-1ubuntu222.04, mixed mode, sharing)
acruise commented 2 months ago

Some stack dumps from selected threads I think sound interesting:

There are ~22 threads in ColumnarBatchOutIterator.nativeHasNext but I haven't checked whether all the stack traces are the same:

"Executor task launch worker for task 21.0 in stage 0.0 (TID 21)" #154 daemon prio=5 os_prio=0 cpu=114.55ms elapsed=8.06s tid=0x00007aa8ac03f050 nid=0x62f8e runnable  [0x00007aa7dc5fd000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
        at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:61)
        at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
        at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
        at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
        at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator.isEmpty(Iterator.scala:387)
        at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
        at org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
        at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:124)
        at org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:79)
        at org.apache.gluten.execution.VeloxColumnarToRowExec$$Lambda$4616/0x00007aa9bd4e47c0.apply(Unknown Source)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:858)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:858)
        at org.apache.spark.rdd.RDD$$Lambda$4617/0x00007aa9bd4b3c90.apply(Unknown Source)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
        at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at org.apache.spark.executor.Executor$TaskRunner$$Lambda$4178/0x00007aa9bd4585a0.apply(Unknown Source)
        at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.12/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.12/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(java.base@17.0.12/Thread.java:840)

And one of these:

"s3a-transfer-alex-dev-testing-unbounded-pool2-t1" #122 daemon prio=5 os_prio=0 cpu=9.87ms elapsed=11.35s tid=0x00007aa794760770 nid=0x62f65 waiting on condition  [0x00007aa7de79b000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@17.0.12/Native Method)
        - parking to wait for  <0x00000000c8664748> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.12/LockSupport.java:252)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@17.0.12/AbstractQueuedSynchronizer.java:1674)
        at java.util.concurrent.LinkedBlockingQueue.poll(java.base@17.0.12/LinkedBlockingQueue.java:460)
        at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@17.0.12/ThreadPoolExecutor.java:1061)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.12/ThreadPoolExecutor.java:1122)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.12/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(java.base@17.0.12/Thread.java:840)

None of the other stacks contain any frames from an org.apache.gluten class.

There are eight of these:

"map-output-dispatcher-0" #66 daemon prio=5 os_prio=0 cpu=0.17ms elapsed=20.02s tid=0x00007aaa3721c950 nid=0x62f14 waiting on condition  [0x00007aa9ba421000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@17.0.12/Native Method)
        - parking to wait for  <0x00000000c865e7f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(java.base@17.0.12/LockSupport.java:341)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@17.0.12/AbstractQueuedSynchronizer.java:506)
        at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.12/ForkJoinPool.java:3465)
        at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.12/ForkJoinPool.java:3436)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@17.0.12/AbstractQueuedSynchronizer.java:1625)
        at java.util.concurrent.LinkedBlockingQueue.take(java.base@17.0.12/LinkedBlockingQueue.java:435)
        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:759)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.12/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.12/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(java.base@17.0.12/Thread.java:840)

I'll stop for now. :)

acruise commented 2 months ago

I tried the same query with the same dataset on a local filesystem, and I get the same GlutenFallbackReporter warning but the query completes successfully. I'll try GCS next to isolate whether S3 per se is an issue.

acruise commented 2 months ago

On GCS, I get this log, even though I've configured gcloud auth according to the doc.

W0912 18:17:40.602993 411440 GCSFileSystem.cpp:303] Config::gcsCredentials is empty

Without the Gluten jar and plugin on the spark command line, this query is successful reading from GCS.

surnaik commented 2 months ago

Hi @acruise, for the GCS run, is it running on a GCE VM under the same project as the GCS bucket? If not, you would have to provide a service account json key file to get around the auth.

acruise commented 2 months ago

you would have to provide a service account json key file to get around the auth.

I always do, as you can see from my "For GCS" section above. If I was trying to run without credentials, I would expect to get an exception immediately, rather than an eternal wait. :)

LoseYSelf commented 2 months ago

you can gdb attach jvm process,and show c++ callstack