dragonwell-project / dragonwell8

Alibaba Dragonwell8 JDK
http://dragonwell-jdk.io
GNU General Public License v2.0
4.21k stars 497 forks source link

SPECjbb2015运行超过5h未结束,且进程未再输出日志 #636

Closed sendaoYan closed 6 months ago

sendaoYan commented 6 months ago

Steps to Reproduce Steps to reproduce the behavior: https://tone.aliyun-inc.com/ws/sclsdnoi/test_result/311350?tab=3

image

日志最后的刷新时间是11:11,java进程启动时间是10:06:25,ps aux显示是Sl状态

jstack-2330.log tone_specjbb2015_jdk11_default.log

Execution environment

# uname -a ; cat /etc/os-release ; free -h ; lscpu | head -n 25 ; /usr/bin/java -version ; /usr/bin/java -Xinternalversion
Linux iZ2zedz4j8s8zy9rs7hf3xZ 5.10.134-1316.git.3e9b2ab5b363.al8.x86_64 #1 SMP Mon May 13 13:18:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
NAME="Alibaba Cloud Linux"
VERSION="3 (Soaring Falcon)"
ID="alinux"
ID_LIKE="rhel fedora centos anolis"
VERSION_ID="3"
UPDATE_ID="9"
PLATFORM_ID="platform:al8"
PRETTY_NAME="Alibaba Cloud Linux 3 (Soaring Falcon)"
ANSI_COLOR="0;31"
HOME_URL="https://www.aliyun.com/"

              total        used        free      shared  buff/cache   available
Mem:           30Gi        26Gi       2.8Gi       2.0Mi       1.6Gi       4.0Gi
Swap:            0B          0B          0B
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      Alibaba Cloud
CPU family:          25
Model:               17
Model name:          AMD EPYC 9T24 96-Core Processor
BIOS Model name:     pc-i440fx-2.1
Stepping:            1
CPU MHz:             3699.997
BogoMIPS:            5400.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            32768K
NUMA node0 CPU(s):   0-7
openjdk version "1.8.0_372"
OpenJDK Runtime Environment (Alibaba Dragonwell Extended Edition 8.15.16) (build 1.8.0_372-b03)
OpenJDK 64-Bit Server VM (Alibaba Dragonwell Extended Edition 8.15.16) (build 25.372-b03, mixed mode)
OpenJDK 64-Bit Server VM (25.372-b03) for linux-amd64 JRE (1.8.0_372-b03), built on Sep 26 2023 03:35:48 by "mockbuild" with gcc 10.2.1 20200825 (Alibaba 10.2.1-3.5 2.32)
sendaoYan commented 6 months ago

一直频繁gc: image

sendaoYan commented 6 months ago

打开gc之后的复现日志:

gc.log tone_specjbb2015_jdk11_default.log tone_specjbb2015_jdk11_default.log.zip

sendaoYan commented 6 months ago

打开gc日志重新运行,前两次运行正常,第3次运行长时间未结束。gc日志显示连续5个小时一直在做老年代的内存回收。老年代的内存使用率5242542K/5242880K=99.99%,基本上经过8次fullgc才能回收1K内存(5242543K->5242542K,每次fuzzgc的时间大概是3s左右),差不多30s才能回收1k的老年代内存。 SPECjbb2015启动的jvm选项包含了-XX:ObjectAlignmentInBytes=32选项,该参数默认值是8,改为32之后导致对象对齐变大,进而导致老年代内存使用紧张,直至达到临界值,SPECjbb2015运行时频繁触发fullgc且内存回收效果有限。JDK8版本使用-XX:ObjectAlignmentInBytes=32参数的话,建议在JVM堆内存在100G或者以上才配置该选项。 至于该问题随机出现的原因,可能是跟老年代的内存碎片化程度有关系。正常跑完的时候,老年代的堆内存分布比较集中;长时间跑不完的时候,内存碎片问题严重。

建议:

  1. 该性能测试,可能会在各种不同的ecs规格上运行,比如2x、4x、8x、16x等规格,建议JVM选项尽量使用默认选项。-Dspecjbb.customerDriver.threads=64改成默认值;-XX:MaxTenuringThreshold=15改成默认值;-XX:ParallelGCThreads=80配置成系统核数,或者使用JVM默认值;-XX:ObjectAlignmentInBytes=32改成默认值,在小堆情况下不建议配置该参数
  2. 打开UsePerfData选项,方便出现问题的时候登录环境查找原因
  3. 测试时打开gc日志,并将gc日志重定向到文件。在测试完成时,使用tone的接口upload_testlogs将gc日志上传到T-one。方便查看正常运行、非正常运行的日志及日志对比。SPECjbb是计算密集型和内存读写密集型性能测试,对IO不敏感,增加gc日志不影响性能跑分

gc.log