dragonwell-project / dragonwell11

Alibaba Dragonwell11 JDK
https://www.aliyun.com/product/dragonwell
GNU General Public License v2.0
557 stars 112 forks source link

[upstream]x64平台运行-Xmixed选项runtime/modules/ModuleStress/ModuleStressGC.java用例小概率随机crash:TypeStackSlotEntries::clean_weak_klass_links(bool) #754

Open sendaoYan opened 8 months ago

sendaoYan commented 8 months ago

job:https://tone.aliyun-inc.com/ws/xesljfzh/test_result/259856?tab=1

Steps to Reproduce Steps to reproduce the behavior:

export test=test/hotspot/jtreg/runtime/modules/ModuleStress/ModuleStressGC.java
function runJtreg() { jtreg -Xint -XX:+UseCompactObjectHeaders -ea -esa -timeoutFactor:4 -v:fail,error,time,nopass -nr -w $dir/index-$1 $test &> $dir/$1.log ; if [[ 0 -ne $? ]] ; then echo -n "$1 " ; else rm -rf $dir/index-$1 $dir/$1.log ; fi ; } ; export -f runJtreg ; export dir="tmp-jtreg-"`basename ${test##* } .java` ; rm -rf $dir ; mkdir -p $dir ; time seq 100000 | xargs -i -n 1 -P `nproc` bash -c "runJtreg {}" ; echo total fail number: `ls $dir/*.log 2> /dev/null | wc | awk '{print $1}'`

复现概率:1/1000

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f2d7d002c4f, pid=1178239, tid=1178452
#
# JRE version: OpenJDK Runtime Environment (Alibaba Dragonwell Extended Edition)-11.0.20.17+8-GA (11.0.21.17+8) (build 11.0.21.17+8)
# Java VM: OpenJDK 64-Bit Server VM (Alibaba Dragonwell Extended Edition)-11.0.20.17+8-GA (11.0.21.17+8, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xc02c4f]  TypeStackSlotEntries::clean_weak_klass_links(bool)+0x3f
#
# Core dump will be written. Default location: /tmp/tone/run/jtreg/jt-work/hotspot_jtreg/runtime/modules/ModuleStress/ModuleStressGC/core.1178239
#
# An error report file with more information is saved as:
# /tmp/tone/run/jtreg/jt-work/hotspot_jtreg/runtime/modules/ModuleStress/ModuleStressGC/hs_err_pid1178239.log
#
# If you would like to submit a bug report, please visit:
#   mailto:dragonwell_use@googlegroups.com
#
];
 stderr: []
 exitValue = 134

core dump文件:

image

JDK version

> uname -a ; cat /etc/os-release ; free -h ; lscpu | head -n 25 ; java -version ; java -Xinternalversion
Linux j66e07344.sqa.eu95 5.10.112-005.ali5000.alios7.x86_64 #1 SMP Fri Jun 24 15:46:48 CST 2022 x86_64 x86_64 x86_64 GNU/Linux
NAME="Alibaba Group Enterprise Linux Server"
VERSION="7.2 (Paladin)"
ID="alios"
ID_LIKE="fedora anolis"
VERSION_ID="7.2"
PRETTY_NAME="Alibaba Group Enterprise Linux Server 7.2 (Paladin)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:alibaba:enterprise_linux:7.2:GA:server"
HOME_URL="https://os.alibaba-inc.com/"
BUG_REPORT_URL="https://os.alibaba-inc.com/"

ALIBABA_BUGZILLA_PRODUCT="Alibaba Group Enterprise Linux 7"
ALIBABA_BUGZILLA_PRODUCT_VERSION=7.2
ALIBABA_SUPPORT_PRODUCT="Alibaba Group Enterprise Linux"
ALIBABA_SUPPORT_PRODUCT_VERSION=7.2
              total        used        free      shared  buff/cache   available
Mem:           187G        9.8G        137G        2.1M         40G        176G
Swap:            0B          0B          0B
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                96
On-line CPU(s) list:   0-95
Thread(s) per core:    2
Core(s) per socket:    24
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
Stepping:              4
CPU MHz:               2500.000
CPU max MHz:           3100.0000
CPU min MHz:           1000.0000
BogoMIPS:              5000.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              33792K
NUMA node0 CPU(s):     0-95
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d

openjdk version "11.0.21.17" 2023-10-17
OpenJDK Runtime Environment (Alibaba Dragonwell Extended Edition)-11.0.20.17+8-GA (build 11.0.21.17+8)
OpenJDK 64-Bit Server VM (Alibaba Dragonwell Extended Edition)-11.0.20.17+8-GA (build 11.0.21.17+8, mixed mode)
OpenJDK 64-Bit Server VM (11.0.21.17+8) for linux-amd64 JRE (11.0.21.17+8), built on Dec 12 2023 03:41:33 by "" with gcc 7.5.0

jtreg.stdout.log

hs_err_pid1178239.log

sendaoYan commented 8 months ago

在32c核的x86物理机(100.81.244.26)上能复现,在96核的x86物理机上运行10w次未复现

image

22365.log 40491.log 42627.log 45295.log 47591.log 48249.log

core dump文件 http://114.55.64.175:8666/compiler-ci-bucket/jdk/core-dump-files/dragonwell11/issue754/

jdk二进制:https://dragonwell.oss-cn-shanghai.aliyuncs.com/11.0.21.18.9-test/Alibaba_Dragonwell_Extended_11.0.21.18.9_x64_linux.tar.gz

sendaoYan commented 8 months ago

-Xmixed -XX:+UseCompactObjectHeaders选项组合也能复现:

994.log

sendaoYan commented 8 months ago

当前版本release复现概率:12/4w,job:https://tone.aliyun-inc.com/ws/xesljfzh/test_result/262166 上一个dragonwel11 release版本(11.0.20.17.8)复现概率:3/4w job:https://tone.aliyun-inc.com/ws/xesljfzh/test_result/262236 当前temurin版本复现概率:0/4w,job:https://tone.aliyun-inc.com/ws/xesljfzh/test_result/262212 当前版本fastdebug跑1w次未复现,job:https://tone.aliyun-inc.com/ws/xesljfzh/test_result/262168

不确定该问题是否跟https://code.alibaba-inc.com/xcode/jdk11/issues/542122描述的随机栈溢出java.lang.StackOverflowError问题是否有关

sendaoYan commented 8 months ago

可能是11.0.15.11.9 版本开始引入的问题

image

sendaoYan commented 8 months ago

temurin11复现概率:3/20w

dragonwell11.0.11.6版本复现概率:1/15w

dragonwell11.0.21.18.9复现概率:2/5w

sendaoYan commented 8 months ago

fastdebug二进制复现一次:47.97.60.108

hs_err_pid2136294.log

core文件:http://114.55.64.175:8666/compiler-ci-bucket/jdk/core-dump-files/dragonwell11/issue754/fastdebug/