Open zhouyifan279 opened 6 months ago
Verified that:
Adding JVM option -XX:+StartAttachListener
can make jstack
work:
./bin/spark-sql \
--conf spark.plugins=org.apache.gluten.GlutenPlugin \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=20g \
--conf spark.driver.extraClassPath=${gluten_jar} \
--conf spark.executor.extraClassPath=${gluten_jar} \
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
--conf spark.driver.extraJavaOptions=-XX:+StartAttachListener
According to this doc, jstack
communicates with JVM via a local socket file under JVM tmpdir, with filename pattern .java_pid
I ran the following test cases and observed different behavior of .java_pid file.
-XX:+StartAttachListener
is specified, .java_pid file is present when JVM starts.-XX:+StartAttachListener
is not specified and --conf spark.plugins=org.apache.gluten.GlutenPlugin
is removed, .java_pid file is present after executing jstack
.-XX:+StartAttachListener
is not specified and --conf spark.plugins=org.apache.gluten.GlutenPlugin
is present, .java_pid file is not present event after executing jstack
.A simplified program call reproduce this Bug.
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
public class AttachListener {
public static void main(String[] args) throws InterruptedException {
File file = extractVeloxLibrary();
System.load(file.getAbsolutePath());
System.out.println("Library velox loaded");
Thread.sleep(Long.MAX_VALUE);
}
static File extractVeloxLibrary() {
String tmpdir = System.getProperty("java.io.tmpdir");
File file = new File(tmpdir, "libvelox.dylib");
if (file.exists()) {
file.delete();
}
try (InputStream is = AttachListener.class.getResourceAsStream("/libvelox.dylib");
FileOutputStream fos = new FileOutputStream(file)) {
byte[] buffer = new byte[4096];
int read;
while ((read = is.read(buffer)) != -1) {
fos.write(buffer, 0, read);
}
} catch (java.io.IOException e) {
throw new RuntimeException("Failed to extract library", e);
}
return file;
}
}
Compile and run:
javac AttachListener.java
java -cp /path/to/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-osx_14.4_aarch_64-1.2.0-SNAPSHOT.jar:/path/to/spark-3.5.1-bin-hadoop3/jars/*:. AttachListener
jstack
also fails on AttachListener
process.
I guess libvelox.dylib affected JVM's internal mechanism. But I'm not a JVM expert and have little knowledge about libvelox.dylib. I can't dig deeper to find the root cause.
OpenJDK Project has a similar issue: https://bugs.openjdk.org/browse/JDK-8235211, but seems not relevant.
I am using macOS(Apple Silicon), JDK:
openjdk version "1.8.0_402"
OpenJDK Runtime Environment (Zulu 8.76.0.17-CA-macos-aarch64) (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (Zulu 8.76.0.17-CA-macos-aarch64) (build 25.402-b06, mixed mode)
and jstack works
Backend
VL (Velox)
Bug description
Launch spark-sql in local mode and run
jstack
against it:jstack
exits with error messageSpark version
spark-3.5.1-bin-hadoop3
Spark configurations
No response
System information
JDK
System
Relevant logs
No response