DataDog / dd-trace-java

Datadog APM client for Java
https://docs.datadoghq.com/tracing/languages/java
Apache License 2.0
587 stars 290 forks source link

dd-trace-java v1.25.1 crashes the JVM #6382

Open michal-kusy opened 11 months ago

michal-kusy commented 11 months ago

Hi DD team, we set up DD agent in our production env last week and we have got 6 random JVM crashes since then.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f6661312634, pid=1, tid=1073020
#
# JRE version: OpenJDK Runtime Environment Corretto-17.0.5.8.1 (17.0.5+8) (build 17.0.5+8-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.5.8.1 (17.0.5+8-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libjavaProfiler1675892613987672088.so+0x39634]  WallClock::signalHandler(int, siginfo_t*, void*, unsigned long long)+0x244
#
# Core dump will be written. Default location: //core
#
# JFR recording file will be written. Location: //hs_err_pid1.jfr
#
# An error report file with more information is saved as:
# //hs_err_pid1.log

[error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007f6688e1ec6c]

#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-17/issues/
#
jbachorik commented 11 months ago

Hi @michal-kusy

Thanks for reporting the issue! Would it be possible to attach the hs_err_pid1.log or any other hs_err_*.log file? I would need that to see the complete stacktrace leading to the SIGSEG.

If the log file contains any PII and you don't want it to be available publicly, you can also open a support ticket and provide the logs there.

Thanks!

michal-kusy commented 11 months ago

Hi @jbachorik,

thank you for taking care of dd-trace-java! I submitted support ticket #1479764 with attached hs_err_pid1.log.

jbachorik commented 11 months ago

In the meantime, can you add -Ddd.profiling.ddprof.wall.enabled=false? This will enable the walllclock profiler which is crashing. I inspected the crash log file but there seems to be something totally off because even the internal JVM stack unwinder fails when trying to generate the crash stacktrace :/

Is this a docker image? If yes, can you share the tag so I can try to reproduce? (I tried by downloading corretto binaries but with no luck - no crashes for me ...)

michal-kusy commented 11 months ago

Yeah it's jib built docker image with amazoncorretto:17.0.5-alpine as base image atm. Next release is going to be based on amazoncorretto:17.0.9-alpine.

jbachorik commented 10 months ago

Hi @michal-kusy - I have succeeded in identifying the crash location and I have proposed fix PR. Unfortunately, I am not able to reproduce the crash as it is most likely dependent on a high rate of thread creation and a bit of 'luck' to handle the profiling signal at a moment when the thread is not fully intitialized or it is being deinitialized.

I wonder if you would be able to test the custom build of the profiler library to see if it helps with the crashes? If not we will just get the fix released as a part of dd-trace-java and you could test just the regular release later.