Open michal-kusy opened 11 months ago
Hi @michal-kusy
Thanks for reporting the issue!
Would it be possible to attach the hs_err_pid1.log
or any other hs_err_*.log
file? I would need that to see the complete stacktrace leading to the SIGSEG.
If the log file contains any PII and you don't want it to be available publicly, you can also open a support ticket and provide the logs there.
Thanks!
Hi @jbachorik,
thank you for taking care of dd-trace-java! I submitted support ticket #1479764 with attached hs_err_pid1.log.
In the meantime, can you add -Ddd.profiling.ddprof.wall.enabled=false
?
This will enable the walllclock profiler which is crashing. I inspected the crash log file but there seems to be something totally off because even the internal JVM stack unwinder fails when trying to generate the crash stacktrace :/
Is this a docker image? If yes, can you share the tag so I can try to reproduce? (I tried by downloading corretto binaries but with no luck - no crashes for me ...)
Yeah it's jib built docker image with amazoncorretto:17.0.5-alpine
as base image atm.
Next release is going to be based on amazoncorretto:17.0.9-alpine
.
Hi @michal-kusy - I have succeeded in identifying the crash location and I have proposed fix PR. Unfortunately, I am not able to reproduce the crash as it is most likely dependent on a high rate of thread creation and a bit of 'luck' to handle the profiling signal at a moment when the thread is not fully intitialized or it is being deinitialized.
I wonder if you would be able to test the custom build of the profiler library to see if it helps with the crashes? If not we will just get the fix released as a part of dd-trace-java and you could test just the regular release later.
Hi DD team, we set up DD agent in our production env last week and we have got 6 random JVM crashes since then.