apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.16k stars 421 forks source link

[VL] Flaky TPCDS q24a #6930

Open zhouyuan opened 1 month ago

zhouyuan commented 1 month ago

Backend

VL (Velox)

Bug description

https://github.com/apache/incubator-gluten/pull/6928#issuecomment-2297738896

Executing SQL query from resource path /tpcds-queries/q24b.sql...
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fd86d9bd966, pid=6067, tid=0x00007fd874626700
#
# JRE version: OpenJDK Runtime Environment (8.0_312-b07) (build 1.8.0_312-b07)
# Java VM: OpenJDK 64-Bit Server VM (25.312-b07 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libvelox.so+0x1807966]  facebook::velox::memory::ScopedMemoryPoolArbitrationCtx::~ScopedMemoryPoolArbitrationCtx()+0x6
#
# Core dump written. Default location: /__w/incubator-gluten/incubator-gluten/tools/gluten-it/core or core.6067
#
# An error report file with more information is saved as:
# /__w/incubator-gluten/incubator-gluten/tools/gluten-it/hs_err_pid6067.log
#
# If you would like to submit a bug report, please visit:
#   https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%208&component=java-1.8.0-openjdk
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
sbin/gluten-it.sh: line 50:  6067 Aborted                 (core dumped) $JAVA_HOME/bin/java $SPARK_JVM_OPTIONS $GLUTEN_IT_JVM_ARGS -XX:ErrorFile=/var/log/java/hs_err_pid%p.log -Dio.netty.tryReflectionSetAccessible=true -cp $JAR_PATH org.apache.gluten.integration.Cli $@

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

zhztheplayer commented 1 month ago

Another https://github.com/apache/incubator-gluten/actions/runs/10465766725/job/28981570997?pr=6931

FelixYBW commented 1 month ago

recent Velox PR?

zhztheplayer commented 1 month ago

Not sure for a certain PR but the issue appeared recently in several days. Was not able to repeat locally yet but am trying.

zhztheplayer commented 1 month ago

Update: Ran q23a + q24b for 100 rounds locally, didn't repeat.

FelixYBW commented 1 month ago

Does it still appear now?