elastic / elastic-otel-java

Apache License 2.0
14 stars 12 forks source link

Jvmti-access lib doesn't support Alpine Linux #228

Closed JonasKunz closed 2 months ago

JonasKunz commented 5 months ago

Currently the universal-profiling-integration crashes on linux distros shipping musl instead of glibc (e.g. Alpine):

[otel.javaagent 2024-04-25 08:31:34:144 +0000] [main] DEBUG co.elastic.otel.UniversalProfilingProcessor - Opening profiler correlation socket /tmp/essockGld1x67O
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000003250, pid=1, tid=7
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.3+9 (21.0.3+9) (build 21.0.3+9-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (21.0.3+9-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# C  [elastic-jvmti-linux-arm64-4813494d137e1631bba301d5acab6e7b-b804a293859bee2eb1968b3ee9e92141.so+0x3890]  init_have_lse_atomics+0xc
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# //hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

This can only be prevented at the moment by disabling the profiling integration via ELASTIC_OTEL_UNIVERSAL_PROFILING_INTEGRATION_ENABLED=false.

We should fix this by either:

JonasKunz commented 5 months ago

Did some quick experiments, looks like statically linking musl in a shared library is not a good idea, because it will still operate on the same shared resources as the JVM-process. This causes problems e.g. with malloc.

So I guess the best solution is to provide separate musl-variants of the binaries and detect musl at runtime.

codefromthecrypt commented 3 months ago

In case someone wants the full error log from a recent build of alpine with gcompat hs_err_pid45.log