adoptium / adoptium-support

For end-user problems reported with our binary distributions
Apache License 2.0
47 stars 15 forks source link

SIGSEGV at G1ParScanThreadState::trim_queue_to_threshold with Temurin-21.0.3+9 #1129

Open jjscl8888 opened 4 months ago

jjscl8888 commented 4 months ago

Please provide a brief summary of the bug

Encountering errors when using virtual threads。

eg: A fatal error has been detected by the Java Runtime Environment: SIGSEGV (0xb) at pc=0x00007f9e12aa3d5c, pid=41, tid=192

JRE version: OpenJDK Runtime Environment Temurin-21.0.3+9 (21.0.3+9) (build 21.0.3+9-LTS) Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (21.0.3+9-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) Problematic frame: V [libjvm.so+0x813d5c] G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x357c Core dump will be written. Default location: /opt/tomcat/core.41 An error report file with more information is saved as: /opt/tomcat/hs_err_pid41.log [thread 94 also had an error] [thread 203 also had an error][thread 195 also had an error][thread 199 also had an error][thread 197 also had an error]

Did you test with the latest update version?

Please provide steps to reproduce where possible

No response

Expected Results

I noticed that JDK-8320253 has been fixed in JDK 22, based on the relevant link: https://bugs.openjdk.org/browse/JDK-8320253. Is it possible to fix this issue in JDK 21?

Actual Results

occasionally SIGSEGV at G1ParScanThreadState::trim_queue_to_threshold with Temurin-21.0.3+9

What Java Version are you using?

openjdk version "21.0.3" 2024-04-16 LTS OpenJDK Runtime Environment Temurin-21.0.3+9 (build 21.0.3+9-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (build 21.0.3+9-LTS, mixed mode, sharing)

What is your operating system and platform?

Host: AMD EPYC 7T83 64-Core Processor, 64 cores, 32G, CentOS Linux release 7.9.2009 (Core)

How did you install Java?

with binary archive

Did it work before?

No response

Did you test with other Java versions?

No

Relevant log output

--------------  T H R E A D  ---------------

Current thread (0x00007f9dadc21000):  WorkerThread "GC Thread#1"    [id=192, stack(0x00007f9daad00000,0x00007f9daae00000) (1024K)]

Stack: [0x00007f9daad00000,0x00007f9daae00000],  sp=0x00007f9daadfd200,  free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x813d5c]  G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x357c
V  [libjvm.so+0x82a7d6]  G1ScanHRForRegionClosure::scan_heap_roots(HeapRegion*)+0x456
V  [libjvm.so+0x8272e8]  G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x1e8
V  [libjvm.so+0x84b07b]  G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b
V  [libjvm.so+0x84b2b9]  G1EvacuateRegionsBaseTask::work(unsigned int)+0x89
V  [libjvm.so+0x1031d90]  WorkerThread::run()+0x80
V  [libjvm.so+0xf753b8]  Thread::call_run()+0xa8
V  [libjvm.so+0xd0012a]  thread_native_entry(Thread*)+0xda

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000100

Registers:
RAX=0x0000000000000000, RBX=0x00007f9da8f12e00, RCX=0x0000000000000000, RDX=0x0000000000000001
RSP=0x00007f9daadfd200, RBP=0x00007f9daadfd2a0, RSI=0x00007f9e1376efa0, RDI=0x00007f9e125731b0
R8 =0x0000000000000000, R9 =0x0000000000000000, R10=0x0000000000000000, R11=0x0000000000000000
R12=0x000000077a95ec8d, R13=0x000000045862e838, R14=0x00007f9e13770808, R15=0x0000000000000001
RIP=0x00007f9e12aa3d5c, EFLAGS=0x0000000000010246, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

Register to memory mapping:

RAX=0x0 is null
RBX=0x00007f9da8f12e00 points into unknown readable memory: 0x00007f9e136a5e58 | 58 5e 6a 13 9e 7f 00 00
RCX=0x0 is null
RDX=0x0000000000000001 is an unknown value
RSP=0x00007f9daadfd200 points into unknown readable memory: 0x0000000000000003 | 03 00 00 00 00 00 00 00
RBP=0x00007f9daadfd2a0 points into unknown readable memory: 0x00007f9daadfd3e0 | e0 d3 df aa 9d 7f 00 00
RSI=0x00007f9e1376efa0: <offset 0x00000000014defa0> in /usr/java/jdk21/lib/server/libjvm.so at 0x00007f9e12290000
RDI=0x00007f9e125731b0: <offset 0x00000000002e31b0> in /usr/java/jdk21/lib/server/libjvm.so at 0x00007f9e12290000
R8 =0x0 is null
R9 =0x0 is null
R10=0x0 is null
R11=0x0 is null
R12=
[error occurred during error reporting (printing register info), id 0xb, SIGSEGV (0xb) at pc=0x00007f9e12a55dc3]
R13=0x000000045862e838 is pointing into object: java.util.concurrent.locks.ReentrantLock$NonfairSync 
{0x000000045862e828} - klass: 'java/util/concurrent/locks/ReentrantLock$NonfairSync'
 - ---- fields (total size 4 words):
 - private transient 'exclusiveOwnerThread' 'Ljava/lang/Thread;' @12  null (0x00000000)
 - private volatile 'state' 'I' @16  0 (0x00000000)
 - private volatile transient 'head' 'Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;' @20  null (0x00000000)
 - private volatile transient 'tail' 'Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;' @24  null (0x00000000)
R14=0x00007f9e13770808: <offset 0x00000000014e0808> in /usr/java/jdk21/lib/server/libjvm.so at 0x00007f9e12290000
R15=0x0000000000000001 is an unknown value
karianna commented 4 months ago

Also See #1088

karianna commented 4 months ago

https://bugs.openjdk.org/browse/JDK-8331735

jjscl8888 commented 4 months ago

Here is the system information where crashes occurred。

Kubernetes

     version : 1.19

Host :

      kernel:  4.19, OS CentOS
      CPU:64  AMD EPYC 7T83 64-Core Processor or 64  Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz

Pod :

     OS:    Linux 4.19.118-2.el7.centos.x86_64/amd64
     Linux OS:    CentOS Linux release 7.9.2009 (Core)

Will this issue be fixed in JDK 21?

karianna commented 4 months ago

Here is the system information where crashes occurred。

Kubernetes

     version : 1.19

Host :

      kernel:  4.19, OS CentOS
      CPU:64  AMD EPYC 7T83 64-Core Processor or 64  Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz

Pod :

     OS:    Linux 4.19.118-2.el7.centos.x86_64/amd64
     Linux OS:    CentOS Linux release 7.9.2009 (Core)

Will this issue be fixed in JDK 21?

There may be a backport, you'll want to follow that issue upstream

jjscl8888 commented 3 months ago

When is this problem expected to be solved? @karianna

karianna commented 3 months ago

You'll need to track https://bugs.openjdk.org/browse/JDK-8331735 for progress.

CH3CHO commented 3 months ago

You'll need to track https://bugs.openjdk.org/browse/JDK-8331735 for progress.

However, Java 21 isn't in the affected versions list. That's why we wondered whether the fix would be backported to 21 or not. Would you add 21 into the list as well?

image

karianna commented 3 months ago

Added 21

github-actions[bot] commented 4 days ago

We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. It will be closed soon unless the stale label is removed by a committer, or a new comment is made.