Karm / mandrel-integration-tests

Integration tests for GraalVM and its Mandrel distribution. Runs Quarkus, Helidon and Micronaut applications and small targeted reproducers. The focus is solely on native-image utility and compilation of Java applications into native executables.
Apache License 2.0
5 stars 3 forks source link

JFRTest#jfrPerfTest fails with Quarkus `main` #236

Closed zakkak closed 6 months ago

zakkak commented 7 months ago

How to reproduce

git clone https://github.com/quarkusio/quarkus.git
cd quarkus
./mvnw -Dquickly
cd ../
git clone https://github.com/Karm/mandrel-integration-tests
cd mandrel-integration-tests
mvn clean verify -Dquarkus.version=999-SNAPSHOT -Ptestsuite -Dtest=JFRTest#jfrPerfTest

Failure:

[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 78.177 s <<< FAILURE! - in org.graalvm.tests.integration.JFRTest
[ERROR] jfrPerfTest{TestInfo}  Time elapsed: 77.691 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: Timeout 10s was reached. Empty webpage does not contain string: `hello' ==> expected: <true> but was: <false>
    at org.graalvm.tests.integration.JFRTest.runBenchmarkForApp(JFRTest.java:355)
    at org.graalvm.tests.integration.JFRTest.startComparisonForBenchmark(JFRTest.java:254)
    at org.graalvm.tests.integration.JFRTest.jfrPerfTestRun(JFRTest.java:214)
    at org.graalvm.tests.integration.JFRTest.jfrPerfTest(JFRTest.java:176)

Additional info

The regression is not caused by a GraalVM change, building with Mandrel 23.1.1.0 also doesn't work. Setting quarkus.version to 3.6.0 makes the test fail in a different stage (possibly a local issue?)

zakkak commented 7 months ago

First known bad commit seems to be 61edae46e0287ecaee7f5b3ef728ab4a6d465c02 in https://github.com/graalvm/mandrel/actions/runs/7026828114/job/19120880487

(Interestingly this started happening with the HELIDON_QUICKSTART failures https://github.com/Karm/mandrel-integration-tests/issues/226)

Last known working commit seems to be 670b43cac8587d27d7810af41c22ca3cf161df23 in https://github.com/graalvm/mandrel/actions/runs/7012992753/job/19078995046

zakkak commented 6 months ago

The regression was introduced with https://github.com/quarkusio/quarkus/pull/37347 and specifically with the runtime re-initialization of io.netty.util.internal.PlatformDependent0.

zakkak commented 6 months ago

Inspecting the static fields of io.netty.util.internal.PlatformDependent0 when build-time-initialized and when run-time-initialized I don't see any difference that could explain the failure (at least not at first sight).

Java mode

maxDirectMemory= 15 GB
logger = JdkLogger(io.netty.util.internal.PlatformDependent0)
ADDRESS_FIELD_OFFSET = 16
BYTE_ARRAY_BASE_OFFSET = 16
INT_ARRAY_BASE_OFFSET = 16
INT_ARRAY_INDEX_SCALE = 4
LONG_ARRAY_BASE_OFFSET = 16
LONG_ARRAY_INDEX_SCALE = 8
DIRECT_BUFFER_CONSTRUCTOR = null
EXPLICIT_NO_UNSAFE_CAUSE = null
ALLOCATE_ARRAY_METHOD = null
ALIGN_SLICE = public final java.nio.ByteBuffer java.nio.ByteBuffer.alignedSlice(int)
JAVA_VERSION = 21
IS_ANDROID = false
STORE_FENCE_AVAILABLE = true
UNSAFE_UNAVAILABILITY_CAUSE = null
INTERNAL_UNSAFE = null
RUNNING_IN_NATIVE_IMAGE = false
IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = false
UNSAFE = sun.misc.Unsafe@31cefde0
HASH_CODE_ASCII_SEED = -1028477387
HASH_CODE_C1 = -862048943
HASH_CODE_C2 = 461845907
UNSAFE_COPY_THRESHOLD = 1048576
UNALIGNED = true
BITS_MAX_DIRECT_MEMORY = 16634609664
$assertionsDisabled = true

Native mode build-time-initialized

maxDirectMemory= 22 GB
logger = JdkLogger(io.netty.util.internal.PlatformDependent0)
ADDRESS_FIELD_OFFSET = 24
BYTE_ARRAY_BASE_OFFSET = 16
INT_ARRAY_BASE_OFFSET = 16
INT_ARRAY_INDEX_SCALE = 4
LONG_ARRAY_BASE_OFFSET = 16
LONG_ARRAY_INDEX_SCALE = 8
DIRECT_BUFFER_CONSTRUCTOR = null
EXPLICIT_NO_UNSAFE_CAUSE = null
ALLOCATE_ARRAY_METHOD = null
ALIGN_SLICE = public final java.nio.ByteBuffer java.nio.ByteBuffer.alignedSlice(int)
JAVA_VERSION = 21
IS_ANDROID = false
STORE_FENCE_AVAILABLE = true
UNSAFE_UNAVAILABILITY_CAUSE = null
INTERNAL_UNSAFE = null
RUNNING_IN_NATIVE_IMAGE = true
IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
UNSAFE = sun.misc.Unsafe@65e6a946
HASH_CODE_ASCII_SEED = -1028477387
HASH_CODE_C1 = -862048943
HASH_CODE_C2 = 461845907
UNSAFE_COPY_THRESHOLD = 1048576
UNALIGNED = true
BITS_MAX_DIRECT_MEMORY = -1
$assertionsDisabled = true

Native mode run-time-initialized

maxDirectMemory= 49 GB
logger = JdkLogger(io.netty.util.internal.PlatformDependent0)
ADDRESS_FIELD_OFFSET = 24
BYTE_ARRAY_BASE_OFFSET = 16
INT_ARRAY_BASE_OFFSET = 16
INT_ARRAY_INDEX_SCALE = 4
LONG_ARRAY_BASE_OFFSET = 16
LONG_ARRAY_INDEX_SCALE = 8
DIRECT_BUFFER_CONSTRUCTOR = null
EXPLICIT_NO_UNSAFE_CAUSE = null
ALLOCATE_ARRAY_METHOD = null
ALIGN_SLICE = public final java.nio.ByteBuffer java.nio.ByteBuffer.alignedSlice(int)
JAVA_VERSION = 21
IS_ANDROID = false
STORE_FENCE_AVAILABLE = true
UNSAFE_UNAVAILABILITY_CAUSE = null
INTERNAL_UNSAFE = null
RUNNING_IN_NATIVE_IMAGE = true
IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
UNSAFE = sun.misc.Unsafe@4417167b
HASH_CODE_ASCII_SEED = -1028477387
HASH_CODE_C1 = -862048943
HASH_CODE_C2 = 461845907
UNSAFE_COPY_THRESHOLD = 1048576
UNALIGNED = true
BITS_MAX_DIRECT_MEMORY = -1
$assertionsDisabled = true

Diff

diff --git a/native-build-time-init-mode.txt b/native-run-time-init-mode.txt
index 6444b24..d0fa1ef 100644
--- a/native-build-time-init-mode.txt
+++ b/native-run-time-init-mode.txt
@@ -1,4 +1,4 @@
-maxDirectMemory= 22 GB
+maxDirectMemory= 49 GB
 logger = JdkLogger(io.netty.util.internal.PlatformDependent0)
 ADDRESS_FIELD_OFFSET = 24
 BYTE_ARRAY_BASE_OFFSET = 16
@@ -17,7 +17,7 @@ UNSAFE_UNAVAILABILITY_CAUSE = null
 INTERNAL_UNSAFE = null
 RUNNING_IN_NATIVE_IMAGE = true
 IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
-UNSAFE = sun.misc.Unsafe@65e6a946
+UNSAFE = sun.misc.Unsafe@4417167b
 HASH_CODE_ASCII_SEED = -1028477387
 HASH_CODE_C1 = -862048943
 HASH_CODE_C2 = 461845907
zakkak commented 6 months ago

Although still not clear why, setting io.netty.tryReflectionSetAccessible=false, at build-time (i.e. "reverting" https://github.com/netty/netty/pull/12638) seems to resolve (or work-around) the issue.

Note however, that setting io.netty.tryReflectionSetAccessible=false at run-time doesn't. Hinting that some build-time initialization effects remain.

zakkak commented 6 months ago

Faster reproducer:

git clone https://github.com/quarkusio/quarkus.git
cd quarkus
./mvnw -Dquickly
cd ../
git clone https://github.com/Karm/mandrel-integration-tests
cd mandrel-integration-tests/apps/jfr-native-image-performance
mvn package -Pnative -Dquarkus.version=999-SNAPSHOT -Dquarkus.native.monitoring=jfr
./target/jfr-runner

In another terminal run curl http://localhost:8080/hello. Instead of hello we get:

curl: (52) Empty reply from server

Note that without -Dquarkus.native.monitoring=jfr (which essentially passes --enable-monitoring=heapdump,jfr to native-image) the server acts as expected. cc @roberttoyonaga

Enabling DEBUG logging doesn't show anything interesting and setting it to ALL results in a build failure (see https://github.com/quarkusio/quarkus/issues/37598).

roberttoyonaga commented 6 months ago

Hi @zakkak I can look into this

zakkak commented 6 months ago

Hi @zakkak I can look into this

@roberttoyonaga no need to jump in yet, just pinging you FYI. I will keep looking further and let you know if needed. Thanks!

zakkak commented 6 months ago

Inspecting the static fields of io.netty.util.internal.PlatformDependent0 and io.netty.util.internal.PlatformDependent with JFR monitoring disabled versus with it enabled I see the following differences:

Native mode build-time-initialized

diff --git a/native-build-time-init-mode.txt b/native-build-time-init-mode-jfr.txt
index f2e49be..3cea102 100644
--- a/native-build-time-init-mode.txt
+++ b/native-build-time-init-mode-jfr.txt
@@ -6,18 +6,18 @@ PlatformDependent0.INT_ARRAY_BASE_OFFSET = 16
 PlatformDependent0.INT_ARRAY_INDEX_SCALE = 4
 PlatformDependent0.LONG_ARRAY_BASE_OFFSET = 16
 PlatformDependent0.LONG_ARRAY_INDEX_SCALE = 8
-PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = null
+PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = private java.nio.DirectByteBuffer(long,long)
 PlatformDependent0.EXPLICIT_NO_UNSAFE_CAUSE = null
-PlatformDependent0.ALLOCATE_ARRAY_METHOD = null
+PlatformDependent0.ALLOCATE_ARRAY_METHOD = public java.lang.Object jdk.internal.misc.Unsafe.allocateUninitializedArray(java.lang.Class,int)
 PlatformDependent0.ALIGN_SLICE = public final java.nio.ByteBuffer java.nio.ByteBuffer.alignedSlice(int)
 PlatformDependent0.JAVA_VERSION = 21
 PlatformDependent0.IS_ANDROID = false
 PlatformDependent0.STORE_FENCE_AVAILABLE = true
 PlatformDependent0.UNSAFE_UNAVAILABILITY_CAUSE = null
-PlatformDependent0.INTERNAL_UNSAFE = null
+PlatformDependent0.INTERNAL_UNSAFE = jdk.internal.misc.Unsafe@438c5457
 PlatformDependent0.RUNNING_IN_NATIVE_IMAGE = true
 PlatformDependent0.IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
-PlatformDependent0.UNSAFE = sun.misc.Unsafe@970d61f
+PlatformDependent0.UNSAFE = sun.misc.Unsafe@62bf4323
 PlatformDependent0.HASH_CODE_ASCII_SEED = -1028477387
 PlatformDependent0.HASH_CODE_C1 = -862048943
 PlatformDependent0.HASH_CODE_C2 = 461845907
@@ -31,7 +31,7 @@ PlatformDependent.MAYBE_SUPER_USER = false
 PlatformDependent.CAN_ENABLE_TCP_NODELAY_BY_DEFAULT = true
 PlatformDependent.UNSAFE_UNAVAILABILITY_CAUSE = null
 PlatformDependent.DIRECT_BUFFER_PREFERRED = true
-PlatformDependent.MAX_DIRECT_MEMORY = 24660934656
+PlatformDependent.MAX_DIRECT_MEMORY = 24593825792
 PlatformDependent.MPSC_CHUNK_SIZE = 1024
 PlatformDependent.MIN_MAX_MPSC_CAPACITY = 2048
 PlatformDependent.MAX_ALLOWED_MPSC_CAPACITY = 1073741824
@@ -40,22 +40,22 @@ PlatformDependent.TMPDIR = /tmp
 PlatformDependent.BIT_MODE = 64
 PlatformDependent.NORMALIZED_ARCH = x86_64
 PlatformDependent.NORMALIZED_OS = linux
-PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@52d3654e
+PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@20d2ffcc
 PlatformDependent.LINUX_OS_CLASSIFIERS = [fedora]
 PlatformDependent.IS_WINDOWS = false
 PlatformDependent.IS_OSX = false
 PlatformDependent.IS_J9_JVM = false
 PlatformDependent.IS_IVKVM_DOT_NET = false
 PlatformDependent.ADDRESS_SIZE = 8
-PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = false
-PlatformDependent.DIRECT_MEMORY_COUNTER = null
-PlatformDependent.DIRECT_MEMORY_LIMIT = 24660934656
-PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@71b6d77f
-PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@164d4f40
-PlatformDependent.UNINITIALIZED_ARRAY_ALLOCATION_THRESHOLD = -1
-PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@47c35bd3
+PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = true
+PlatformDependent.DIRECT_MEMORY_COUNTER = 0
+PlatformDependent.DIRECT_MEMORY_LIMIT = 24593825792
+PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@15f1832f
+PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@3d0d87db
+PlatformDependent.UNINITIALIZED_ARRAY_ALLOCATION_THRESHOLD = 1024
+PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@59fc3392
 PlatformDependent.LINUX_ID_PREFIX = ID=
 PlatformDependent.LINUX_ID_LIKE_PREFIX = ID_LIKE=
 PlatformDependent.BIG_ENDIAN_NATIVE_ORDER = false
-PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@e515370
+PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@26842a93
 PlatformDependent.$assertionsDisabled = true

Native mode run-time-initialized

diff --git a/native-run-time-init-mode.txt b/native-run-time-init-mode-jfr.txt
index 5a1aa1f..c31a84c 100644
--- a/native-run-time-init-mode.txt
+++ b/native-run-time-init-mode-jfr.txt
@@ -6,7 +6,7 @@ PlatformDependent0.INT_ARRAY_BASE_OFFSET = 16
 PlatformDependent0.INT_ARRAY_INDEX_SCALE = 4
 PlatformDependent0.LONG_ARRAY_BASE_OFFSET = 16
 PlatformDependent0.LONG_ARRAY_INDEX_SCALE = 8
-PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = null
+PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = private java.nio.DirectByteBuffer(long,long)
 PlatformDependent0.EXPLICIT_NO_UNSAFE_CAUSE = null
 PlatformDependent0.ALLOCATE_ARRAY_METHOD = null
 PlatformDependent0.ALIGN_SLICE = public final java.nio.ByteBuffer java.nio.ByteBuffer.alignedSlice(int)
@@ -14,10 +14,10 @@ PlatformDependent0.JAVA_VERSION = 21
 PlatformDependent0.IS_ANDROID = false
 PlatformDependent0.STORE_FENCE_AVAILABLE = true
 PlatformDependent0.UNSAFE_UNAVAILABILITY_CAUSE = null
-PlatformDependent0.INTERNAL_UNSAFE = null
+PlatformDependent0.INTERNAL_UNSAFE = jdk.internal.misc.Unsafe@147a7def
 PlatformDependent0.RUNNING_IN_NATIVE_IMAGE = true
 PlatformDependent0.IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
-PlatformDependent0.UNSAFE = sun.misc.Unsafe@744c41dd
+PlatformDependent0.UNSAFE = sun.misc.Unsafe@37286f61
 PlatformDependent0.HASH_CODE_ASCII_SEED = -1028477387
 PlatformDependent0.HASH_CODE_C1 = -862048943
 PlatformDependent0.HASH_CODE_C2 = 461845907
@@ -40,22 +40,22 @@ PlatformDependent.TMPDIR = /tmp
 PlatformDependent.BIT_MODE = 64
 PlatformDependent.NORMALIZED_ARCH = x86_64
 PlatformDependent.NORMALIZED_OS = linux
-PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@656445c5
+PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@48221138
 PlatformDependent.LINUX_OS_CLASSIFIERS = [fedora]
 PlatformDependent.IS_WINDOWS = false
 PlatformDependent.IS_OSX = false
 PlatformDependent.IS_J9_JVM = false
 PlatformDependent.IS_IVKVM_DOT_NET = false
 PlatformDependent.ADDRESS_SIZE = 8
-PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = false
-PlatformDependent.DIRECT_MEMORY_COUNTER = null
+PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = true
+PlatformDependent.DIRECT_MEMORY_COUNTER = 0
 PlatformDependent.DIRECT_MEMORY_LIMIT = 53206843392
-PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@301cc3e2
-PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@8905ee9
+PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@6f6af829
+PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@140b865c
 PlatformDependent.UNINITIALIZED_ARRAY_ALLOCATION_THRESHOLD = -1
-PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@5b464f2a
+PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@70835f0e
 PlatformDependent.LINUX_ID_PREFIX = ID=
 PlatformDependent.LINUX_ID_LIKE_PREFIX = ID_LIKE=
 PlatformDependent.BIG_ENDIAN_NATIVE_ORDER = false
-PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@2cd8284d
+PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@5d61e8ac
 PlatformDependent.$assertionsDisabled = true

FTR I am using https://github.com/zakkak/issue-reproducers/tree/2023-12-07-mandrel-it-tests-236 to get these.

zakkak commented 6 months ago

Enabling DEBUG logging doesn't show anything interesting

I take that back...

When jfr monitoring is enabled I see

java.nio.Bits.unaligned: available, true

While when it is disabled I see:

java.nio.Bits.unaligned: unavailable, true, Unable to make static boolean java.nio.Bits.unaligned() accessible: module java.base does not "opens java.nio" to unnamed module @21bcffb5

which indicates that enabling jfr monitoring somehow results in java.base opening java.nio to the unnamed module used by Quarkus.

zakkak commented 6 months ago

@roberttoyonaga does this ^^ ring any bells?

roberttoyonaga commented 6 months ago

does this ^^ ring any bells?

hmm not really

roberttoyonaga commented 6 months ago

@zakkak So just to confirm, you're seeing that build-time init of io.netty.util.internal.PlatformDependent0 is fine (with and without JFR enabled). But runtime init/re-init causes issues?

The only part of the diff you shared here that immediately strikes me as suspicious is that build time init with JFR has:

-PlatformDependent0.ALLOCATE_ARRAY_METHOD = null
+PlatformDependent0.ALLOCATE_ARRAY_METHOD = public java.lang.Object jdk.internal.misc.Unsafe.allocateUninitializedArray(java.lang.Class,int)

While runtime init has no diff: PlatformDependent0.ALLOCATE_ARRAY_METHOD = null

Maybe this is not an issue, but I've fixed bugs in the past related to allocation events causing issues at startup (when current_thread hasn't yet been assigned or other things are not set up yet). Maybe something along those lines is happening since it seems like quarkus is failing to respond to the first request. It's just a hunch though. I can try to look into it more after the serviceability meetup.

zakkak commented 6 months ago

@zakkak So just to confirm, you're seeing that build-time init of io.netty.util.internal.PlatformDependent0 is fine (with and without JFR enabled). But runtime init/re-init causes issues?

Correct, but only when JFR is enabled.

The only part of the diff you shared here that immediately strikes me as suspicious is that build time init with JFR has:

Yes that's interesting, but it still used to work with that diff (we used to build-time-initialize these classes without observing CI failures, but now we run-time-initialize them and we get this CI failure).

The comparison of run-time-init (current approach of Quarkus for these classes) with and without JFR in https://github.com/Karm/mandrel-integration-tests/issues/236#issuecomment-1846237567 is also pretty interesting:

-PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = null
+PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = private java.nio.DirectByteBuffer(long,long)
-PlatformDependent0.INTERNAL_UNSAFE = null
+PlatformDependent0.INTERNAL_UNSAFE = jdk.internal.misc.Unsafe@147a7def
-PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = false
+PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = true
-PlatformDependent.DIRECT_MEMORY_COUNTER = null
+PlatformDependent.DIRECT_MEMORY_COUNTER = 0

My next step would be to check what is enabled in GraalVM / Mandrel when enabling jfr monitoring and why it affects netty's run-time initialization.

zakkak commented 6 months ago

which indicates that enabling jfr monitoring somehow results in java.base opening java.nio to the unnamed module used by Quarkus.

Confirmed, it happens here https://github.com/oracle/graal/blob/4dd8a9d611b78668bb84116cf7ff8cc7e4146907/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrFeature.java#L156

I will try to do the same in Quarkus to see how it will behave.

zakkak commented 6 months ago

I found what's the route cause, see https://github.com/quarkusio/quarkus/issues/37626, but I still have not a clear understanding of why (which code-flow it triggers) we see failures only with JFR enabled.

roberttoyonaga commented 6 months ago

Nice! And I guess setting the workaround of io.netty.tryReflectionSetAccessible=false prevents it from trying to continue without the correct configuration. Yeah it doesn't yet make sense how adding JFR contributes to this problem.

zakkak commented 6 months ago

FTR The stack trace in Quarkus leading to the failure is:

2023-12-09 00:36:04,630 TRACE [io.ver.cor.htt.imp.HttpServerImpl] (vert.x-eventloop-thread-0) Connection failure: java.lang.Error: java.lang.NullPointerException
    at io.netty.util.internal.PlatformDependent0.newDirectBuffer(PlatformDependent0.java:620)
    at io.netty.util.internal.PlatformDependent0.allocateDirectNoCleaner(PlatformDependent0.java:579)
    at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:774)
    at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledUnsafeNoCleanerDirectByteBuf.java:30)
    at io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf.allocateDirect(UnpooledByteBufAllocator.java:186)
    at io.netty.buffer.UnpooledDirectByteBuf.<init>(UnpooledDirectByteBuf.java:64)
    at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:41)
    at io.netty.buffer.UnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledUnsafeNoCleanerDirectByteBuf.java:25)
    at io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeNoCleanerDirectByteBuf.<init>(UnpooledByteBufAllocator.java:181)
    at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:91)
    at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
    at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
    at io.netty.buffer.Unpooled.directBuffer(Unpooled.java:128)
    at io.netty.handler.codec.http.HttpObjectEncoder.<clinit>(HttpObjectEncoder.java:59)
    at java.base@21.0.1/java.lang.Class.ensureInitialized(DynamicHub.java:595)
    at java.base@21.0.1/java.lang.Class.ensureInitialized(DynamicHub.java:595)
    at io.vertx.core.http.impl.HttpServerWorker.configureHttp1OrH2C(HttpServerWorker.java:276)
    at io.vertx.core.http.impl.HttpServerWorker.handleHttp1(HttpServerWorker.java:209)
    at io.vertx.core.http.impl.HttpServerWorker.access$200(HttpServerWorker.java:50)
    at io.vertx.core.http.impl.HttpServerWorker$1.configure(HttpServerWorker.java:180)
    at io.vertx.core.http.impl.Http1xOrH2CHandler.end(Http1xOrH2CHandler.java:59)
    at io.vertx.core.http.impl.Http1xOrH2CHandler.channelRead(Http1xOrH2CHandler.java:38)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base@21.0.1/java.lang.Thread.runWith(Thread.java:1596)
    at java.base@21.0.1/java.lang.Thread.run(Thread.java:1583)
    at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:832)
    at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:211)
Caused by: java.lang.NullPointerException
    at io.netty.util.internal.PlatformDependent0.newDirectBuffer(PlatformDependent0.java:614)
    ... 44 more

Which is the result of PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER being set (and observed by other classes) at build time while PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR is null at run time (in Quarkus, because in my tests above it appears to be set :/)

DEBUG [io.net.uti.int.PlatformDependent0] (Thread-3) java.nio.DirectByteBuffer.<init>(long, {int,long}): unavailable
roberttoyonaga commented 6 months ago

but I still have not a clear understanding of why (which code-flow it triggers) we see failures only with JFR enabled.

Is it possible that java.base opening java.nio to the unnamed module is the reason that it only fails with JFR included at build time? When JFR is enabled, PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = private java.nio.DirectByteBuffer(long,long), which causes PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER to be set, resulting the problems at runtime due to missing reflection config. Maybe when JFR is not enabled and java.base doesn't open java.nio, PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR must remain null (as shown in your diff) because java.nio.DirectByteBuffer can't be reflectively accessed. That would result in avoiding the problem you explained above.

If I remove https://github.com/oracle/graal/blob/4dd8a9d611b78668bb84116cf7ff8cc7e4146907/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrFeature.java#L156 , then everything seems to work fine (I get the expected "hello" response).

zakkak commented 6 months ago

Is it possible that java.base opening java.nio to the unnamed module is the reason that it only fails with JFR included at build time?

Yes, although it's still not clear to me why. Thanks for confirming this with a modified GraalVM version.

zakkak commented 6 months ago

Closed by https://github.com/quarkusio/quarkus/pull/37633

I will create a new issue for the discrepancies when using --enable-monitoring=jfr