why BufferRecycler uses atomic ops on its buffers?

franz1981 commented 1 year ago

According to https://github.com/FasterXML/jackson-core/blob/31d4d85dcb25ca6699294db69591388470b9f8fc/src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java#L133 and https://github.com/FasterXML/jackson-core/blob/31d4d85dcb25ca6699294db69591388470b9f8fc/src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java#L159 BufferRecycler, that IIUC is supposed to be thread-confined (and obtained via thread local allocation/recycling), can avoid using atomic instructions (that are quite costy in the hot path, especially if supposely single-threaded).

I see that the reason is due to the releaseXYZBuffer(int ix, ...) that can (maybe) perform release from whatever thread...is it correct? Or there are other reasons?

I'm preparing a patch, to get used with the codebase, in case :)

franz1981 commented 1 year ago

Reading https://github.com/FasterXML/jackson-core/issues/479 that contains some background about it

franz1981 commented 1 year ago

I see that the mentioned issue above perfectly answer my question and I can close this issue, although in my experience, hot path getAndSet are far distant from being free, especially on x86, which imply costy xchg instructions to be emitted. What's likely is that whatever benchmark/test wasn't stressing enough this, making it negligible if compared to other bottleneck: and that's fine; it means too that fixing this shouldn't bring noticeable improvements till other bottlenecks will be addressed first, if that previous analysis is still valid now (that's 2023 vs 2019, I suppose many things has changed from there).

@cowtowncoder Based on https://github.com/FasterXML/jackson-core/issues/479#issuecomment-525143217 it mentions jackson-benchmarks and I'll give it a short: I'm going to report here or in another issue depending if relevant for the current topic.

cowtowncoder commented 1 year ago

Right, the short answer is that non-blocking/async use cases would be problematic; more than one JsonParser or JsonGenerator would get access to same buffer, and although they'd not run concurrently (bound to thread) they would interfere/cause corruption.

franz1981 commented 1 year ago

although they'd not run concurrently (bound to thread) they would interfere/cause corruption.

I don't understand this; if no concurrent access happen because they interleave correctly based on external as-sequential order, no atomic ops should be needed - I'm not a natural english speaker so maybe I've misunderstood your comment too. I'll read again the referenced issue to better understand, thanks!

cowtowncoder commented 1 year ago

although they'd not run concurrently (bound to thread) they would interfere/cause corruption.

I don't understand this; if no concurrent access happen because they interleave correctly based on external as-sequential order, no atomic ops should be needed - I'm not a natural english speaker so maybe I've misunderstood your comment too. I'll read again the referenced issue to better understand, thanks!

Ok. So, the problem is that effectively access to BufferRecyler WOULD come from different threads, because JsonParser/JsonGenerator instances were no longer bound to specific thread. So parser#1 on thread#1 would get an instance of BufferRecycler, start using it. But then another parser, parser#2 would be created on thread#1 as well; and in the meantime parser#1 may be running on different unrelated thread. This happens on async processing when different stages of processing are done on different set of execution threads.

Put another way: life-cycles of individual parsers/generators overlap, and they do not stay thread-bound in non-blocking use scenarios.

And because of this, access to buffers within BufferRecyclers started to need syncing that was not needed for blocking use cases where the issue usually does not occur (but I guess with different thread pools they actually can? Maybe it's not even just async usage).

franz1981 commented 1 year ago

(but I guess with different thread pools they actually can? Maybe it's not even just async usage).

I think that if the owner of the BufferRecycler escape its thread where it has been allocated/recyled and the same BufferRecycler instance is shared and was accessible by others, it can still happen (that's the same problem with reactive libraries that don't do thread confinement, instead of running on the same event loop over and over - that's what Mutiny does IIRC). So, in theory the problem is the sharing part: this can change if the thread local BufferRecycler can be "consumed" while acquired, and, if on the same thread, another JsonParser/JsonGenerator will try to acquire it, it has to create/reuse if avaialble another one. This will make the BufferRecycler to not be shared anymore and it won't requires any atomic op, but will likely make, with a reactive stack, to have multiple BufferRecyclers per thread (base on the concurrency level i.e. how many concurrent reactive stages are using different ones), but it shouldn't be a big deal (right now if BufferRecyclers components getAndSet return null it will create a new one, so, simlarly here, different strategies could be used to make everyone happy - including limiting the number of BufferRecycler instances).

What could happen with such approach is that, while the BufferRecycler will be "recycled" (something that doesn't exist yet, with an explicit "close" IIUC), it should contains a reference of the originating ("owner" let's call it) thread (local? not important really), stored in a concurrent queue (multi producer, single consumer) and will dispose it there by offering it back, until specific limits and if the thread is not dead (the BufferRecycler should have a soft/weak ref over Thread to avoid keeping it alive more then necessary and/or offering a disposed recycler to someone no longer alive).

In a future the thread-local concurrent queue of BufferRecycler can be replaced, for virtual threads, by a fixed size (or growable too) array of queues and can use some nice math trick to distribute which queues to picking it based on the thread-id (I've solved something similar on https://github.com/franz1981/netty/blob/2c9a513edbe9250cb9270d37c428e9931a63f524/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L387).

franz1981 commented 1 year ago

I see @cowtowncoder that I've been way to verbose to describe the idea, let me know if I have to rewrite it in a more concise way or if something isn't clear (my fault)

cowtowncoder commented 1 year ago

@franz1981 I think that I do not actually see all that much benefit in trying to solve the way buffer recycling currently works -- locking has overhead but I would not consider it a major problem.

But rather I think that the bigger problem is coming issue with ThreadLocals and inflexibility of current internal API. So to me refactoring of this API, to allow alternative implementations is the important part. Trying to figure out how BufferRecyclers were not shared across Threads is probably fool's errand as it'd tie things more closely to ThreadLocal (instead of less).

I think your last paragraph is nicely in line with what I think of future, fwtw; yes, the direction would be towards more centralized recycling (per-factory, but one pool, not using ThreadLocal).

Still, how to change in place for Jackson 2.x is non-trivial, and this is not at the top of my priority list either (it's high just not top).

franz1981 commented 1 year ago

I think that I do not actually see all that much benefit in trying to solve the way buffer recycling currently works -- locking has overhead but I would not consider it a major problem.

I though that having a simpler lifecycle (unshared, but own exclusively per-JsonParser/JsonGenerator), decoupled from the owner thread, but just having a notion of "owner" (which API is unique and got diff impl if is threadlocal thing or anything different) for BufferRecycler would benefit the code more then the performance aspects. In this way the meaning of being "recycled" means "unshared" and in a quiescent state and we're trusty (unless users misues, that probably we should still capture somehow) that we can do whatever we want with it.

I think your last paragraph is nicely in line with what I think of future, fwtw; yes, the direction would be towards more centralized recycling (per-factory, but one pool, not using ThreadLocal).

Yep, I can come up with something related this, but myself got few things on my place recently, although I can work on this in background.

Still, how to change in place for Jackson 2.x is non-trivial, and this is not at the top of my priority list either (it's high just not top).

Don't worry I perfectly understand/relate to this :) I'm very interested into your comment on https://github.com/FasterXML/jackson-core/issues/919#issuecomment-1425028482

The way release would have to happen, I think, is with close() of JsonParser/JsonGenerator instance, clearing local BufferRecycler first, then calling recycler.release() method

if I can apply this change in the right way this would unblock the rest (not in a release eh, I mean in fork to start with)

cowtowncoder commented 1 year ago

@franz1981 yes, if you want to go ahead, propose something that'd be a good way to go.

So the idea would be for a single parser/generator to own BufferRecycler. Ideally I guess there'd be separate reader-/writer-side recyclers, really (now that there's no shared ownership, so half of buffers are always unreferenced), but that would require too big an API change. Considering all dataformat backends that need retrofitting, and where ideally we would have some level of compatibility across "adjacent" minor versions.

cowtowncoder commented 5 months ago

Has been inactive for a while; in the meantime pluggable src/main/java/com/fasterxml/jackson/core/util/RecyclerPool.java (see f.ex #1089) has been added to allow configuring life-cycle aspects.

Will close this issue; may be re-opened/re-filed if further progress is expected.

FasterXML / jackson-core

why BufferRecycler uses atomic ops on its buffers? #920