Open franz1981 opened 7 months ago
With this as is, impossible to use new adaptive allocator too.
@franz1981 this might affect quarkus too
The reality seems more complex, see https://github.com/eclipse-vertx/vert.x/blob/7fdc3980c4e70bb71d3b27d471a41f17db581bca/vertx-core/src/main/java/io/vertx/core/net/impl/NetServerImpl.java#L516-L518
in short; with SSL we uses a different allocator (here -> https://github.com/eclipse-vertx/vert.x/blob/master/vertx-core/src/main/java/io/vertx/core/buffer/impl/PartialPooledByteBufAllocator.java) which uses the mentioned custom vertx allocator at https://github.com/eclipse-vertx/vert.x/issues/5168#issue-2208289433, but without SSL, instead, we just use the default Netty allocator.
To enable the adaptive one we should "at least" move into using a different entry point to setup the Netty configured "default" one i.e. https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufAllocator.java#L24
@vietj @cescoffier this is something to fix on quarkus as well i think
This can affect quarkus as well if we don't reuse the allocator configured on the Netty instance, which could be different from the singleton at PooledByteBufAllocator.DEFAULT
, causing to allocate both (and pay twice the RSS).
@vietj @geoand In the case above what's the "idiomatic" way for quarkus to obtain the configured allocator? I cannot see any chain of calls to get there, TBH.
In the case above what's the "idiomatic" way for quarkus to obtain the configured allocator? I cannot see any chain of calls to get there, TBH.
I am not sure I follow this... What exactly are you after here?
Yep, so, in short:
PartialPooledByteBufAllocator.INSTANCE
or the Netty PooledByteBufAllocator.DEFAULT
AppendBuffer
(and other code paths) just use the Netty one i.e. PooledByteBufAllocator.DEFAULT
To solve this, we should obtain from vertx itself which configured allocator it uses, so we won't duplicate the RSS, makes sense @geoand ?
Sure, that makes sense, but it won't really involve any Quarkus APIs or anything - we just need the extensions code do the proper thing
IDK @geoand probably requires some change in the API till Vertx, to make sure we can query the configured allocator to vertx, or you have a better idea in mind? I'm very open, IDK yet how to solve it, TBH
Sounds reasonable
I've tried a different less invasive approach to this: see https://github.com/eclipse-vertx/vert.x/pull/5262
@zekronium are you a quarkus or vertx user? Just to understand If I ask your feedback if you want to give adaptive a shot :)
@zekronium are you a quarkus or vertx user? Just to understand If I ask your feedback if you want to give adaptive a shot :)
98% Vert.x user with BoringSSL (tc-native) and JDK 21 (unsafe enabled with opens and useReflection=true) on top of that too. ~Also run virtual threads in some projects, which based of the known benefits of the adaptive pooler might be interesting to see if it helps~
In quarkus currently OpenSSL/BoringSSL is not supported per se from what I've seen in the issues, right?
EDIT: Scratch the Virtual Thread part, if I recall correctly, Vert.x even with VT's performs all the IO on the eventloops and returns you (Unsafe)HeapBuffers, which are unpooled afaik in all cases, which this new allocator does not cover. So I think because the virtual threads wont actually interact with the allocator, the benefit of potentially better performance with VT's wont be measurable in my case if I got that right
Exactly @zekronium - probably the only benefit you could have is if you do things on your own using the vertx allocator. - on VT. And clearly in Quarkus, because we use it for few things (e.g. jackson json encoding)
@franz1981 Is Quarkus jackson json encoding different to the Vert.x one?
Yes @zekronium and will get better with time, like the wine :) see https://github.com/quarkusio/quarkus/pull/41063
where we bytecode generate the serializers to save reflection ;)
Ah, reminds me of jsoniter-dynamic, they do similar things but its old and unmaintained now. Waiting for backport :)
@zekronium are you a quarkus or vertx user? Just to understand If I ask your feedback if you want to give adaptive a shot :)
I have benchmarked our client application. Which method would you suggest to primarily look at as a common hotpath for both adaptive and default that would clearly direct comparison
For apples to apples, which is newDirectBuffer, which I think is something well to look at since it universally covers IO with our settings:
This test was ran as a simple PoC with quite light load for 2 minutes. With it it seems to be about ~3x improvement!
These parts seem to consume slightly more CPU time, but might be by design?
To better understand @zekronium :
This was a simple “feeler” test I ran using our request client for two minutes. I simply attached intelij profiler on my m1 max mac.
It was a realistic load run, about 1k rps out, 1k connections. Rough usage was about 2gb with unsafe.
I will comeback with a proper benchmark on proper hardware, but I hope this info is enough for now
About your finding on the cost of buffer operations, it is expected, because the adaptive buffers piggyback to the usual Netty ones, but still way too much. I will run some micro on Netty itself to see if we can improve there and verify it.
Currently, in vertx https://github.com/eclipse-vertx/vert.x/blob/adbe97600cc6215f15ce2fac629c89f93618ca8f/src/main/java/io/vertx/core/buffer/impl/VertxByteBufAllocator.java#L25 is NOT reusing the Netty's
PooledByteBufAllocator.DEFAULT
, causing creation of more thread-local direct buffers and arenas, enlarging the RSS footprint of vertx application, for no reason.What's the reason behind this choice @vietj?
The reason why it should be changed, is to "ease" the life of users and libraries which allocate Netty direct buffers using the Netty one and can end up allocating new arenas because of this.
If the aforementioned pool re-use the Netty one, clearly is getting some additional contention, but will save memory, which seems a reasonable trade-off.