eclipse-vertx / vert.x

Vert.x is a tool-kit for building reactive applications on the JVM
http://vertx.io
Other
14.32k stars 2.08k forks source link

Try reuse existing Netty pooled allocator singleton (Fixes #5168) #5292

Closed franz1981 closed 3 weeks ago

franz1981 commented 2 months ago

Fixes #5168

This is part of https://github.com/eclipse-vertx/vert.x/pull/5262

non-SSL connections configure PooledByteBufAllocator.DEFAULT as per https://github.com/eclipse-vertx/vert.x/blob/f42b5d22c5c738cae1db925a06029212b198c398/vertx-core/src/main/java/io/vertx/core/net/impl/NetServerImpl.java#L515-L519

In this scenario, hitting these call sites:

https://github.com/eclipse-vertx/vert.x/blob/65403248e1e5c679d5418d8040aafea5bd09d0ef/src/main/java/io/vertx/core/net/impl/VertxHandler.java#L61

https://github.com/eclipse-vertx/vert.x/blob/f42b5d22c5c738cae1db925a06029212b198c398/vertx-core/src/main/java/io/vertx/core/http/impl/Http2ConnectionBase.java#L59

or

https://github.com/eclipse-vertx/vert.x/blob/f42b5d22c5c738cae1db925a06029212b198c398/vertx-core/src/main/java/io/vertx/core/buffer/impl/BufferImpl.java#L49-L54

referencing VertxByteBufAllocator.DEFAULT , it triggers the initialization of VertxByteBufAllocator.POOLED_ALLOCATOR as well and by consequence its inner structures - making it a GC Root.

It results into a duplication of allocators i.e. PooledByteBufAllocator.DEFAULT and VertxByteBufAllocator.POOLED_ALLOCATOR.

This patch aim to reuse the existing singleton i.e. PooledByteBufAllocator.DEFAULT if the user doesn't mess-up with -Dio.netty.noPreferDirect, in order to guarantee the original behaviour to be preserved.

cescoffier commented 2 months ago

Hold on before merging this one. We need to discuss versioning schemes.

franz1981 commented 2 months ago

@cescoffier @vietj did you decided what could be best to do? :pray:

cescoffier commented 2 months ago

Right now, it would be Vertx 5 only, as we won't have a Vertx 4.6

franz1981 commented 2 months ago

@cescoffier For quarkus - what is the expected time frame where vertx 5 can be integrated? Remember that this change I did here could easily be backported too, given that it brings tangible advantage with zero effort, in quarkus

cescoffier commented 2 months ago

For Quarkus, Vert.x 5 integration will start in spring 2025 and end (optimistically) in fall 2025.

The fact that we don't really know when useDirect is true or false (not the same value in tests, if you use JPMS, if you have access to usafe ... ) and that it can drastically change the behavior makes it hard to accept in a version that is intended to be used in the LTS.

franz1981 commented 2 months ago

The fact that we don't really know when useDirect is true or false (not the same value in tests, if you use JPMS, if you have access to usafe ... ) and that it can drastically change the behavior makes it hard to accept in a version that is intended to be used in the LTS.

This is the same that will do Netty under the hood actually; it's not at all related to Unsafe but is instead related the sys property -Dio.netty.noPreferDirect (unrelated to unsafe) - and accessible with PooledByteBufAllocator.defaultPreferDirect (which is agnostic to unsafe).

Currently Netty allocates its own singleton(s) reading the same property; the point of checking it upfront here is to preserve the same behaviour the vertx users has benefitted till now.

If this is still not enough to mitigate the risk; I can think of another patch which further narrow the problem - with clearly some tradeoffs, let me know your opinion :+1:

cescoffier commented 2 months ago

It's not what we saw with Julien yesterday, where we got prefer direct true in some cases and false in other cases - without setting any system property.

cescoffier commented 2 months ago

I will only accept the risk if it does not change the current default except if explicitly configured (like with a system property) - but as I said - it seems to set the default value differently depending on various things (like unsafe, JVM version, JPMS, NEtty 4.1 vs. 4.2...)

franz1981 commented 2 months ago

I can see why @cescoffier thanks for sharing, that helped:

It is likely due to the presence of cleaner's API accessible, see https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L202

which transitively depends by Unsafe presence, see https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/CleanerJava9.java#L79

that raise some concern regardless this PR, because, right now in Vertx:

  1. without SSL: if unsafe is not present -> cleaner is noop -> Vertx uses Netty's allocator singleton which does prefer heap pooled buffers: this is bad, see https://github.com/netty/netty/blob/7893d5f150aefc99bc720387875395afad94d807/codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java#L328 causing Heap arenas to be allocated (which cost heap, not off-heap) and additional on-the-flight direct buffers while interacting with NIO sockets i.e. more memory/heap used, more cpu used
  2. with SSL: if unsafe is not present -> cleaner is noop -> Vertx uses its own partial allocator which force using direct buffers (without a CLEANER! - can OOM a container if a direct arena is freed, cause we leave it at the mercy of the GC), but which doesn't pool heap buffers (and causing SSL JDK a performance issue)

In short: if Unsafe is not present, it is a disaster, from a performance perspective. So, if you have any run which have no Cleaner API accessible due to lack on Unsafe, we need to address it regardless, because it is bad, performance, but most importantly, stability-wise, for our users.

I'll perform some deeper analysis tomorrow, but my current conclusion is, with the current PR:

  1. the cleaner won't be available - we will get in the case 2, but for non SSL as well: this is a change, but not really, given that with SSL is already like that - and we still save the memory because the Netty singleton won't be used (but I have to make sure it will NEVER be referenced - which can cause allocating it)
  2. if the cleaner will be available - nothing change from now, but we will save to allocate the vertx allocator, saving memory

This is what happen in detail with this PR, is it more clear? I wasn't fully aware of all the changes

cescoffier commented 2 months ago

Well, and it will likely be different between JVM and native. We need more investigation to understand all the consequences, even if today's solution is far from ideal.

As I said, we cannot change the behavior in an LTS.

franz1981 commented 2 months ago

@cescoffier

Well, and it will likely be different between JVM and native.

which is kind of surprising (to me - because I didn't profiled native image apps enough - mea culpa :( )

We need more investigation to understand all the consequences, even if today's solution is far from ideal.

+100 And I'll fill some issue with all the consequences and details: this stuff is NOT documented in Netty and only knowing how it works help - so better doing it once for "all" , if you agree

As I said, we cannot change the behavior in an LTS.

I'll pick the choice of using a sys property to not impact anyone - but I still need to dug into this enough to make sure what we get/loose and risks

franz1981 commented 2 months ago

@cescoffier @zakkak

I'm a bit concerned about

It's not what we saw with Julien yesterday, where we got prefer direct true in some cases and false in other cases - without setting any system property.

In JDK 17, 21 and native image with 21-graal the behaviour is as expected, see

image

and at

image

where you can spot AbstractByteBufAllocator::buffer calling AbstractByteBufAllocator::directBuffer

and allocating io.netty.buffer.PooledUnsafeDirectByteBuf which is what we would expected.

in short: native image with latest quarkus and 21-graal is correctly finding both Unsafe and the Cleaner and using the "right" buffers (using the Cleaners).

@cescoffier @vietj

can you help clarifying in which context both have observed defaultPreferDirect to return false?

I believe instead that you observed a different thing, let me show what's my hypothesis:

In short:

franz1981 commented 2 months ago

PTAL @cescoffier

I've added a new property vertx.reuseNettyAllocators, which is false by default, to preserve the existing behaviour. When merged in Quarkus we can set this property to true saving the doubling of RSS mentioned at https://github.com/quarkusio/quarkus/issues/42224

If the approach is sound I can add a similar sys property to allow Netty heap buffers to be pooled with JDK SSL engine, fixing https://github.com/quarkusio/quarkus/issues/41880#issuecomment-2258734835 TLDR poor performance with JDK SSL due to big heap allocations in the hot path.

cescoffier commented 2 months ago

That looks more acceptable, but I do not see where it's documented and tested (mainly because we may detect behavior change between Vert.x 4.x and 5.x)

franz1981 commented 2 months ago

Given that it depends by a static final property, if I set it to try it, is going to impact all the tests after it, randomly (because it depends how junit order test execution). I have tested that the default has not changed, instead. We can speak with @vietj on how to make it testable, without running the whole test suite again with this property in. Consider that it doesn't change the classes used to perform the pooled allocations, but just which instance and currently there are no specific tests on vertx to cover what exist now. I am opened to suggestions.

cescoffier commented 2 months ago

We may have a bit more flexibility to test it in Quarkus (I don't care where the tests are, just that we need some coverage)

franz1981 commented 1 month ago

@vietj Any other concern for this @cescoffier @vietj ?