Closed franz1981 closed 11 months ago
Each of the 2 proposed solution got its own (not critical) problems anyway:
Scoped Value
as context storage)Yes, ThreadLocals will need to go. The problem is work prioritisation. Maybe Quarkus users would be better advised to try other JSON libs that are further ahead of the curve on Loom support. Loom users will need to recognise that these old projects weren't written with Loom in mind.
I think it would be in most people's interest (certainly our Quarkus users who love Jackson) to work together to come up with a solution instead of having to settle for some lesser library
@pjfanning thanks for the quick feedback; We're very happy Jackson users and maybe there are solutions that won't involve changes in the current priority management on this project eg exposing an SPI for the buffer pooling to let users provide their own (or disable it)
There's https://github.com/FasterXML/jackson-core/issues/835 - if someone wants to do a PR for that that also takes this into account, that would be great. I would not read too much into the 2.15 label on that issue because we have users crying out for a 2.15 release and not much time to squeeze extra items into that release. And a 2.16 release would probably not happen for quite some time after that.
In the meantime, has any Loom fan tried playing with the JsonFactory.Feature.USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING
setting - ie disabling this feature?
@pjfanning
tried playing with the JsonFactory.Feature.USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING setting - ie disabling this feature?
TBH I was personally too scared to do it eheh but it's a great suggestion and it can be a no-brain workaround for this, in the meantime - although it would just save creating the thread local, while keeping the same problem ie allocating big buffer(s) based on the default sizes i.e. https://github.com/FasterXML/jackson-core/blob/62c1ef923703654d202b206f345898ed4f6b640a/src/main/java/com/fasterxml/jackson/core/util/BufferRecycler.java#L76-L77
The other problem is, given that's a wide sys level setting, it will impact everyone, while, as Netty users we can have mixed type of threads using jackson, and will hit the I/O threads too, that are poor "platform" threads (just thinking loud)
@franz1981 do you need help setting that in Quarkus?
The other problem is, given that's a wide sys level setting, it will impact everyone, while, as Netty users we can have mixed type of threads using jackson, and will hit the I/O threads too, that are poor "platform" threads (just thinking loud)
In Quarkus we can likely limit the blast radious of this by using separate ObjectMapper objects if needed
Yes, this is sort of expected problem but tricky one to resolve, esp. wrt configurability. Pluggable buffer recycling implementation would make sense, and wrt timelines rather sooner than later (so def in 2.x). I would be happy to help with this, but may not have time to look into this myself very soon (despite considering it important obviously).
I would definitely suggest trying JsonFactory.Feature.USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING
in the meantime.
@franz1981 Just to make sure: this is not really system-wide setting, but per-JsonFactory (which then is per-ObjectMapper), with all of pros and cons. I have not added use of System properties due to issues with global scope and use by different frameworks; Netty is a good example.
Ultimately, the use of ThreadLocal here is a form of resource pooling and we could use a pool implementation explicitly instead. Thing is, pool implementations are not necessarily very efficient themselves. Anyone have a suggestion about low latency object pool implementations?
Aside from legit questions about buffer pooling, recycling, there's the question of how to interface this with existing JsonFactory/JsonParser/JsonGenerator implementations.
Looking at existing recycling, the core abstraction really is a single BufferRecycler
, accessed by JsonFactory
via BufferRecyclers
. BufferRecyclers
has the ThreadLocal
. This would not necessarily be a problem if we refactored things a bit -- except for one major problem: there is no release call for BufferRecycler
instances but only for actual buffers. This works ok with traditional threading model (since BufferRecycler
instances are per-thread and so references by JsonParser
/JsonGenerator
do not live past their life-cycle), but would not work if thread-per-BR coupling is removed.
And it seems that trying to build life-cycle for BufferRecycler
is the tricky part: the reason there is no release call is both because it is not needed (as I said, next parser/generator that ask for it is for same thread) and because figuring out when all reuse is done is not easy: calling code doesn't really have to close()
JsonParser
or JsonGenerator
(at least for purposes of buffer recycling).
Then again, maybe requiring BufferRecycler
release from close()
would not be a big deal.
It'd probably also mean that BufferRecycler
instances accessed using centralized pooling mechanism would need to keep references to "owning" BufferRecyclers
so that they can indicate being released.
Maybe I should try figuring out how to do this refactoring after all.
@pjfanning
Thing is, pool implementations are not necessarily very efficient themselves. Anyone have a suggestion about low latency object pool implementations?
Yep, I'm a developer of the JCTools library (see https://github.com/JCTools/JCTools/blob/master/pom.xml#L42), that implement few multi producer multi consumer queues that could be used as a "simple" shared pool (the other option is a ConcurrentLinkedQueue
, but is very memory hungry and slow, in comparison). You can check an independent benchmark here at https://vmlens.com/articles/scale/scalability_queue/ and if you're worried about the dependencies, given that I'm a developer (and inventor) of such algorithm, I have no problem to implement a specialized version just for Jackson.
Said that, usually such solutions are not "wow" because highly contended tail/head are not great and it's not easy to enforce global memory usage limits, but if you're interested I can propose something ad-hoc in the next week that provide the best of the 2 worlds.
The other option, instead, is to complain with loom-dev (in a kind and respectful way), because "they" have the right abstraction for that (mentioned in https://github.com/FasterXML/jackson-core/issues/919#issue-1576266381) i.e. Per-carrier-thread cache. The beauty of providing a generic buffer pool API is that , when/if this nice impl from JDK will ever be available, the replacement will be immediate (and will replace the ThreadLocal one too, because is a "transparent" thread local, ignoring that the caller is a virtual thread)
@cowtowncoder
It'd probably also mean that BufferRecycler instances accessed using centralized pooling mechanism would need to keep references to "owning" BufferRecyclers so that they can indicate being released.
I cannot find a central point thats state that a BufferRecycler can be safely released/recycled (IOContext
borrow a single instance meaning that IOContext
is a non thread safe- thread confined instance IIUC); although
IOContext
can keep track of acquire/release of single parts of the buffer and could decide to release it fully when every, already borrowed ones, are released, but seems complex and not "natural".
@franz1981 The way release would have to happen, I think, is with close()
of JsonParser
/JsonGenerator
instance, clearing local BufferRecycler
first, then calling recycler.release()
method. Assumption that buffers themselves are first given back holds and has been the case for years now (i.e. this has not ever caused an issue, to the best of my knowledge).
IOContext
is a dumb container to pass recycler to parser/generator and doesn't really have control over life-cycle. Whether it should be used to pass instance is a good question, I'll just note that's what happens as of now (probably was convenient at some point, avoiding passing more arguments to parser/generator constructors).
Given just a bit of time I could refactor things bit by bit for 2.15. But... time is limited alas :)
Another alternative https://github.com/DanielYWoo/fast-object-pool the last still living object pool for java
Has advantage: can validate object (similar as jdbc connection pools do)
@franz1981 BTW Quarkus has its own HikariCP Killer: agroal jdbc connection pool https://agroal.github.io/
Is is possible to extract core-pool from it?
Just to make sure: I do not think we want to include anything complicated or sizable within jackson-core
. Instead these should be add-ons, plugged via usual extension mechanism. And such plug-in could just use regular dependencies to object pool library/framework.
It looks like Fast Object Pool might be good base for such an extension.
@magicprinc the question about Agroal can be answered by @barreiro
@franz1981 BTW Quarkus has its own HikariCP Killer: agroal jdbc connection pool https://agroal.github.io/
Is is possible to extract core-pool from it?
no, it's not possible. Agroal is a specialized JDBC pool. it doesn't abstract away in order to obtain the best performance.
BTW, How Spring 6 has solved this problem? They had a lot of things in ThreadLocal, e.g. Transaction processing (active transaction), HttpRequest/Response, Spring context itself (in web enviroment)
@cowtowncoder
It would be easier to make such add-ons if all ThreadLocal's were encapsulated in one Context
class with all cached classes and then a user could choose:
1) ThreadLocal add-on to store Context
2) Vert.x add-on which can store it in Vert.x Reactive Context
3) Object Pool
4) JCTools' MPMC Queue
5) https://github.com/EsotericSoftware/kryo 's Pool
FWTW we do have non-ThreadLocal buffer recycler pool now, although need some final tweaks (wrt #1106). Default for 2.16 at least will remain ThreadLocal-backed one but beyond that we could consider changing the default. And frameworks are obviously free to re-configure defaults as they see fit starting with 2.16 (but may be limited wrt range of Jackson version supported).
@cowtowncoder It would be easier to make such add-ons if all ThreadLocal's were encapsulated in one
Context
class with all cached classes and then a user could choose:1. ThreadLocal add-on to store Context 2. Vert.x add-on which can store it in Vert.x Reactive Context 3. Object Pool
Already adding pluggability for 2.16; only one use of ThreadLocal
in jackson-core
. Usage not wide-spread; although I think one or two dataformats (Smile I think) have limited additional usage.
In general I am not a fan of ThreadLocal
usage so it's somewhat bounded problem.
@franz1981 Are there any benchmarks: MPMC Queue -vs- https://github.com/DanielYWoo/fast-object-pool -vs- same fast-object-pool but with disruptor (an available option)?
PS: I mention fast-object-pool because it is the last alive java generic object pool (recent commits on github) and can use disruptor.
I know two other pools, but they are not separate project, but classes inside main project Kryo Pool and Vert.x Pool
Sorry for misinformation about Spring.
They still use ThreadLocal (e.g. https://github.com/spring-projects/spring-framework/blob/main/spring-tx/src/main/java/org/springframework/transaction/support/TransactionSynchronizationManager.java ),
BUT they use ThreadLocal not as cache
, but as context or "session".
So: Thread.start → startTransaction → session created → obj1.methodA → obj2.methodB → commit Transaction → session closed → thread.stop
No difference for platform thread or virtual thread.
So, this is completely another story :-(
@magicprinc
Are there any benchmarks: MPMC Queue -vs- https://github.com/DanielYWoo/fast-object-pool -vs- same fast-object-pool but with disruptor (an available option)?
Users of quarkus (which is not here the right place to discuss this, so please raise the question on the quarkus github repo) can customize jackson which include, when the new release with the custom pool, the ability to use the one they prefer.
If it was my call/choice I won't still pick fast-object-pool
, because:
conversantmedia:disruptor
which I really dislike, 'cause has been named to the original one as a click-baitSaid that, @mariofusco is using a proper JMH microbench to evaluate different options, including the ones currently offered in jackson, by default.
@franz1981 Thank You! One hundred likes! 👍
My finest solution, how to work with ANY version of Jackson in Virtual Threads without ANY object pool.
⇒ Combining the strengths of different pools.
@Test void _jacksonInVtConcept () throws InterruptedException {
Executor handMadeVirtualThreadExecutor = r->new Thread(r).start();// empty ThreadLocal
Callable<String> taskReqThreadLocal = ()->{
var m = Jack.MAPPER.readValue("{a:1,b:true}", Map.class);
return Jack.MAPPER.writeValueAsString(m);
};
var doneSignal = new CountDownLatch(2);
//1 slow "classic"
handMadeVirtualThreadExecutor.execute(Unchecked.runnable(()->{
var jsonStr = taskReqThreadLocal.call();
assertEquals("{\"a\":1,\"b\":true}", jsonStr);
doneSignal.countDown();
}));
//2 fast "magic"
handMadeVirtualThreadExecutor.execute(Unchecked.runnable(()->{
var jsonStr = ForkJoinPool.commonPool().submit(taskReqThreadLocal).get();
assertEquals("{\"a\":1,\"b\":true}", jsonStr);
doneSignal.countDown();
}));
assertTrue(doneSignal.await(2, TimeUnit.SECONDS));
}
@cowtowncoder I think we can happily close this :) thanks for the hard work @pjfanning @cowtowncoder and @mariofusco !!!
Yes! Thank you for reminder @franz1981.
hi, does there are conclusion to support Virtual thread? I haven't see any PR code change for jackson-core
.
As for this one, it's work around, but some performance slow compare with directly execute.
Virtual Threads - millions, slow, cheap to create and block, "empty" ThreadLocal. Block them easily.
//2 fast "magic" handMadeVirtualThreadExecutor.execute(Unchecked.runnable(()->{ var jsonStr = ForkJoinPool.commonPool().submit(taskReqThreadLocal).get(); assertEquals("{\"a\":1,\"b\":true}", jsonStr); doneSignal.countDown(); }));
@magicprinc This one will slow 2~2.5x compare with directly execute. I just test on my MacOS with a simple object.
By the way, using OS thread will slow 7x
Need a feature can using locked of CarrierThreadLocal
which like ScopedValue but used for smaller range for using ThreadLocal cached object.
I though the usage can be similiar this snippet:
// some virtual thread context here
// using CareerTHreadLocal do serialize
var bytes = CarrierScopedValue.get(ObjectMapper). // get CarrierThreadLocal object
execute( objectMapper -> {objectMapper.writeXXX()}); // execute "atomic" logic with the ThreadLocal object.
// continue logic on virtual thread and using result `bytes`
The CarrierScopedValue.execute
method can promised virtual thread not be unmounted from current CarrierThread
.
It will be a perfect solution if JDK can support CarrierThreadLocal
for requirement of object pool with ThreadLocal in virtual thread.
@cowtowncoder I think this issue could be closed due to https://github.com/FasterXML/jackson-core/pull/1061 being already merged
@franz1981 Issue was closed a while ago.
@LoranceChen Let's not discuss things on closed threads: if more work/improvements suggested, please file a new issue (with link back to this one as background, if that makes sense).
get it. I will consult with another topic if needed.
Hi team.
this has been raised because, as Jackson users i.e https://github.com/quarkusio/quarkus with existing (experimental) support for https://quarkus.io/guides/virtual-threads, we would like to help fixing any integration issue with such technology.
Based on https://openjdk.org/jeps/425 and https://openjdk.org/jeps/429, re thread local pooling:
In short, https://openjdk.org/jeps/429 warn against using
ThreadLocal
s as pooling pattern, and thatScoped Values
mechanism won't save the day nor will be able to replace such mechanism.said that, where Jackson exibit this problem? Let me attach some data:
In the above flamegraph, a lot of cycles are spent in https://github.com/FasterXML/jackson-core/blob/390472bbdf4722fe058f48bb0eff5865c8d20f73/src/main/java/com/fasterxml/jackson/core/util/BufferRecyclers.java#L67 while called from a virtual thread forces both
ThreadLocal
and subsequent pooled buffers allocations.The common usage scenario for virtual threads is to NOT cache them (as the Loom team advertise against it) and to be "a lot" and short lived; meaning that Jackson will end up creating tons of useless garbage that won't be reused, defeating the purpose and benefit of pooling.
What solution exists here?
For 1 we can make uses of JCtools mpmc queues designed to be very good with contention (see https://github.com/JCTools/JCTools/wiki/Getting-Started-With-JCTools#example-a-simple-object-pool for more info), but still, if we care about be friendly on contention, we could use a striped approach a-la Striped64 or simpler too e.g array of atomic ref where to perform
getAndSet
orcompareAndSet
(and distribute the first picking choice with Knuth multiplicative hashing + xorshifts).For 2 I suggest to raise the problem (I can help in case) to the loom-dev list and ask if the JDK team got plan to open up that nice (and dangerous) API in some (better and less risky) form, to allow
ThreadLocal
pooling users to still use the same exact mechanism.