Closed franz1981 closed 5 months ago
Reading https://github.com/FasterXML/jackson-core/issues/479 that contains some background about it
I see that the mentioned issue above perfectly answer my question and I can close this issue, although in my experience, hot path getAndSet
are far distant from being free, especially on x86, which imply costy xchg
instructions to be emitted.
What's likely is that whatever benchmark/test wasn't stressing enough this, making it negligible if compared to other bottleneck: and that's fine; it means too that fixing this shouldn't bring noticeable improvements till other bottlenecks will be addressed first, if that previous analysis is still valid now (that's 2023 vs 2019, I suppose many things has changed from there).
@cowtowncoder
Based on https://github.com/FasterXML/jackson-core/issues/479#issuecomment-525143217 it mentions jackson-benchmarks
and I'll give it a short: I'm going to report here or in another issue depending if relevant for the current topic.
Right, the short answer is that non-blocking/async use cases would be problematic; more than one JsonParser
or JsonGenerator
would get access to same buffer, and although they'd not run concurrently (bound to thread) they would interfere/cause corruption.
although they'd not run concurrently (bound to thread) they would interfere/cause corruption.
I don't understand this; if no concurrent access happen because they interleave correctly based on external as-sequential order, no atomic ops should be needed - I'm not a natural english speaker so maybe I've misunderstood your comment too. I'll read again the referenced issue to better understand, thanks!
although they'd not run concurrently (bound to thread) they would interfere/cause corruption.
I don't understand this; if no concurrent access happen because they interleave correctly based on external as-sequential order, no atomic ops should be needed - I'm not a natural english speaker so maybe I've misunderstood your comment too. I'll read again the referenced issue to better understand, thanks!
Ok. So, the problem is that effectively access to BufferRecyler
WOULD come from different threads, because JsonParser
/JsonGenerator
instances were no longer bound to specific thread. So parser#1 on thread#1 would get an instance of BufferRecycler
, start using it. But then another parser, parser#2 would be created on thread#1 as well; and in the meantime parser#1 may be running on different unrelated thread.
This happens on async processing when different stages of processing are done on different set of execution threads.
Put another way: life-cycles of individual parsers/generators overlap, and they do not stay thread-bound in non-blocking use scenarios.
And because of this, access to buffers within BufferRecyclers
started to need syncing that was not needed for blocking use cases where the issue usually does not occur (but I guess with different thread pools they actually can? Maybe it's not even just async usage).
(but I guess with different thread pools they actually can? Maybe it's not even just async usage).
I think that if the owner of the BufferRecycler
escape its thread where it has been allocated/recyled and the same BufferRecycler
instance is shared and was accessible by others, it can still happen (that's the same problem with reactive libraries that don't do thread confinement, instead of running on the same event loop over and over - that's what Mutiny does IIRC).
So, in theory the problem is the sharing part: this can change if the thread local BufferRecycler
can be "consumed" while acquired, and, if on the same thread, another JsonParser/JsonGenerator
will try to acquire it, it has to create/reuse if avaialble another one.
This will make the BufferRecycler
to not be shared anymore and it won't requires any atomic op, but will likely make, with a reactive stack, to have multiple BufferRecycler
s per thread (base on the concurrency level i.e. how many concurrent reactive stages are using different ones), but it shouldn't be a big deal (right now if BufferRecycler
s components getAndSet
return null
it will create a new one, so, simlarly here, different strategies could be used to make everyone happy - including limiting the number of BufferRecycler
instances).
What could happen with such approach is that, while the BufferRecycler
will be "recycled" (something that doesn't exist yet, with an explicit "close" IIUC), it should contains a reference of the originating ("owner" let's call it) thread (local? not important really), stored in a concurrent queue (multi producer, single consumer) and will dispose it there by offering it back, until specific limits and if the thread is not dead (the BufferRecycler
should have a soft/weak ref over Thread
to avoid keeping it alive more then necessary and/or offering a disposed recycler to someone no longer alive).
In a future the thread-local concurrent queue of BufferRecycler
can be replaced, for virtual threads, by a fixed size (or growable too) array of queues and can use some nice math trick to distribute which queues to picking it based on the thread-id (I've solved something similar on https://github.com/franz1981/netty/blob/2c9a513edbe9250cb9270d37c428e9931a63f524/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L387).
I see @cowtowncoder that I've been way to verbose to describe the idea, let me know if I have to rewrite it in a more concise way or if something isn't clear (my fault)
@franz1981 I think that I do not actually see all that much benefit in trying to solve the way buffer recycling currently works -- locking has overhead but I would not consider it a major problem.
But rather I think that the bigger problem is coming issue with ThreadLocal
s and inflexibility of current internal API. So to me refactoring of this API, to allow alternative implementations is the important part. Trying to figure out how BufferRecycler
s were not shared across Threads is probably fool's errand as it'd tie things more closely to ThreadLocal
(instead of less).
I think your last paragraph is nicely in line with what I think of future, fwtw; yes, the direction would be towards more centralized recycling (per-factory, but one pool, not using ThreadLocal).
Still, how to change in place for Jackson 2.x is non-trivial, and this is not at the top of my priority list either (it's high just not top).
I think that I do not actually see all that much benefit in trying to solve the way buffer recycling currently works -- locking has overhead but I would not consider it a major problem.
I though that having a simpler lifecycle (unshared, but own exclusively per-JsonParser/JsonGenerator
), decoupled from the owner thread, but just having a notion of "owner" (which API is unique and got diff impl if is threadlocal thing or anything different) for BufferRecycler
would benefit the code more then the performance aspects.
In this way the meaning of being "recycled" means "unshared" and in a quiescent state and we're trusty (unless users misues, that probably we should still capture somehow) that we can do whatever we want with it.
I think your last paragraph is nicely in line with what I think of future, fwtw; yes, the direction would be towards more centralized recycling (per-factory, but one pool, not using ThreadLocal).
Yep, I can come up with something related this, but myself got few things on my place recently, although I can work on this in background.
Still, how to change in place for Jackson 2.x is non-trivial, and this is not at the top of my priority list either (it's high just not top).
Don't worry I perfectly understand/relate to this :) I'm very interested into your comment on https://github.com/FasterXML/jackson-core/issues/919#issuecomment-1425028482
The way release would have to happen, I think, is with close() of JsonParser/JsonGenerator instance, clearing local BufferRecycler first, then calling recycler.release() method
if I can apply this change in the right way this would unblock the rest (not in a release eh, I mean in fork to start with)
@franz1981 yes, if you want to go ahead, propose something that'd be a good way to go.
So the idea would be for a single parser/generator to own BufferRecycler
. Ideally I guess there'd be separate reader-/writer-side recyclers, really (now that there's no shared ownership, so half of buffers are always unreferenced), but that would require too big an API change.
Considering all dataformat backends that need retrofitting, and where ideally we would have some level of compatibility across "adjacent" minor versions.
Has been inactive for a while; in the meantime pluggable src/main/java/com/fasterxml/jackson/core/util/RecyclerPool.java
(see f.ex #1089) has been added to allow configuring life-cycle aspects.
Will close this issue; may be re-opened/re-filed if further progress is expected.
According to https://github.com/FasterXML/jackson-core/blob/31d4d85dcb25ca6699294db69591388470b9f8fc/src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java#L133 and https://github.com/FasterXML/jackson-core/blob/31d4d85dcb25ca6699294db69591388470b9f8fc/src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java#L159
BufferRecycler
, that IIUC is supposed to be thread-confined (and obtained via thread local allocation/recycling), can avoid using atomic instructions (that are quite costy in the hot path, especially if supposely single-threaded).I see that the reason is due to the
releaseXYZBuffer(int ix, ...)
that can (maybe) perform release from whatever thread...is it correct? Or there are other reasons?I'm preparing a patch, to get used with the codebase, in case :)