Open Enigo opened 7 months ago
what does your model look like? some Object field? maybe you should try removing that
The LRUMap is one implementation of LookupCache. You could write your own implementation that doesn't have thread safety (eg unguarded read/writes to a HashMap). I have https://github.com/pjfanning/jackson-caffeine-cache (a Caffeine based LookupCache) which is an another alternative.
The LruMap’s putIfAbsent might do an optimistic get first to avoid locking. That might be enough, else swap the implementations as @pjfanning suggested (Caffeine includes that optimization).
edit: It looks like TypeFactory does a get
earlier, so that optimization idea might not have much effect. It may simply be that the default cache size is too small (200 entries)?
If TypeFactory
caching is truly becoming hotspot, there is likely something you can do to avoid it by eagerly resolving target type into JavaType
. This bottleneck definitely should not normally occur.
I don't think it makes sense to try to use custom cache implementation, but to rather see why type resolution becomes bottleneck and see if it can be avoided.
thanks everyone to replying!
@yawkat
we don't have any Object
fields, only String, Integer, Collection, List, Set, nothing fancy
@ben-manes @pjfanning in my local tests the type map never exceeds 200 entries, and I also tried with the huge size but it didn't work I'll give Caffeine a try, saw some other benchmarks with good results as well
@cowtowncoder do you mean something among those lines?
public byte[] marshalToBytes(final Object bean) {
try {
final var javaType = objectMapper.constructType(bean.getClass());
final var objectWriter = objectMapper.writerFor(javaType);
return objectWriter.writeValueAsBytes(bean);
} catch (final JsonProcessingException e) {
return null;
}
}
or more of doing
final var javaType = objectMapper.constructType(bean.getClass());
final var objectWriter = objectMapper.writerFor(javaType);
once on service startup?
@Enigo latter; former wouldn't help. This assuming you know in advanced types.
You could also pre-create ObjectWriter
s, but if the contention is wrt type resolution, retaining JavaType
(which is result of type resolution) matters most.
But looking at flame graph bit more I realized it might not be that simple. Instead, it looks like some serializers (JsonSerializer
) were not being properly cached. For that, pre-creating typed ObjectWriter
should help, but also should not be necessary as long as you retain one ObjectMapper
.
Increasing cache size or using different cache for TypeFactory
seems unlikely to help I think.
what im confused by is why you spend so much time in _findAndAddDynamic. Ideally that call should only happen a few times before it's cached. You mention you already tried a fixed ObjectReader/Writer, but the flame graph doesn't show that, so please move back to that. The only reason I can think of to repeatedly hit _findAndAddDynamic with a proper ObjectWriter would be to have weird dynamic field situations
Agree with what @yawkat said. I guess there is a possibility that some Collection (or possibly Object
-valued property) has values of many different types and in-serializer dynamic caching hits limit (there's a simple linear, fixed-sized lookup data structure; but it expands to at most 8 type/serializer entries, then resets).
(reason for fixed size is because it's simple linear lookup, not hash-based)
So the problem is not as much type resolution (although there are generic types being resolved as per stack traces) but re-creating value serializers, I think. Problem in itself, likely bigger (cpu-usage wise) than contention of type resolution.
here is flamegraph of the version with a separate ObjectMapper
for each of those converters from point (3). With ObjectWriter
created out of ObjectMapper
. And to be precise - this is a flamegraph of a single converter that takes the longest wait time
and this one is the same version but with caffeine instead both graphs show values of Lock Wait Time
I'm running a test with caffeine as I type this message and so far it actually works more stable on 16x, which I haven't seen in all my previous attempts. still, it is not perfectly smooth
still, the recorded lock wait time seems quite high.
with a separate ObjectMapper
per converter we now are serializing only one Java class, which is quite big tho, with many fields and nested fields. do you think it would be beneficial if we do a call objectWriter.writeValueAsBytes
with an instance of this class during the converters initialization? so that the type is known and cached?
Interesting! Calling writeValuesAsBytes()
(or writeValue() with Writer/OutputStream that does nothing, i.e. drops output) might help... although looking at flamegraphs, I realized that the problem spot is likely not within root value, but dynamic part of dependencies, which may or may get pre-created.
Still, that should not hurt at very least.
I'm still confused by the new flame graphs. The bulk of the time is spent in _findAndAddDynamic. aiui, this method is supposed to populate a cache, it's not going to be on the hot path because the hot path is going to use the existing cached value. Your flame graph looks like that is not what's happening, so you should investigate why that is — maybe the writer is not reused properly?
Another possibility is that your flame graph is misleading. You say the graph shows lock wait time. But if that's really all it shows, this result is entirely expected: the hot path should have zero lock wait time, so it won't show up, but the warmup may wait on locks as it populates caches. In theory this should not have a perf impact after the warmup, have you verified with async-profiler or another normal cpu sampler? The warmup may also have significant perf impact before it settles down, if you have many types hitting the same cache concurrently. Question is how much it matters for your app.
@yawkat I was guessing that the time taken is actually from call to find (and as necessary, construct, including all introspection) actual serializer. This would be heavy-weight operation.
But then again flamegraph does not seem to suggest that... which would make me wonder if it could be an artifact of some kind.
lemme try to answer some of the questions
maybe the writer is not reused properly definitely reused, tested it thoroughly
have you verified with async-profiler or another normal cpu sampler? nope, haven't done that
The warmup may also have significant perf impact before it settles down, if you have many types hitting the same cache concurrently. Question is how much it matters for your app. that's pretty much the issue with my app - instances are getting overloaded on the startup, so it matters a lot
but good news are - with caffeine
and calling writeValuesAsBytes
on beans initialization we don't observe severe locking time on startup anymore, and even tho it is still there it doesn't seem to affect the instances
Thank you @Enigo. It is quite peculiar that there is this contention, but glad that combination of changes can resolve it.
I would still be interested in how much of solution is use of full Caffeine classes for cache, vs. warm-up with writeValueAsBytes()
(or similar).
ah, now there is somewhat of a different issue I added caffeine cache like this:
private static class CaffeineLookupCache<K, V> implements LookupCache<K, V> {
private final ConcurrentMap<K, V> cache;
CaffeineLookupCache(final int initialCapacity,
final int maximumSize) {
final Cache<K, V> cache = Caffeine.newBuilder()
.initialCapacity(initialCapacity)
.maximumSize(maximumSize).build();
this.cache = cache.asMap();
}
@Override
public V putIfAbsent(final K key,
final V value) {
return cache.putIfAbsent(key, value);
}
...
}
and then using it like this
final var objectMapper = new ObjectMapper();
objectMapper.setTypeFactory(TypeFactory.defaultInstance().withCache(new CaffeineLookupCache<>(200, 1000)));
I see this warning in logs:
The cache is experiencing excessive wait times for acquiring the eviction lock. This may indicate that a long-running computation has halted eviction when trying to remove the victim entry. Consider using AsyncCache to decouple the computation from the map operation
and the stacktrace that points to ObjectWriter.writeValueAsBytes
java.util.concurrent.TimeoutException: null
at com.github.benmanes.caffeine.cache.BoundedLocalCache.lock(BoundedLocalCache.java:1569) ~[caffeine-3.1.8.jar:3.1.8]
at com.github.benmanes.caffeine.cache.BoundedLocalCache.afterWrite(BoundedLocalCache.java:1547) ~[caffeine-3.1.8.jar:3.1.8]
at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2314) ~[caffeine-3.1.8.jar:3.1.8]
at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:2283) ~[caffeine-3.1.8.jar:3.1.8]
at com.mycompany.json.CaffeineLookupCache.putIfAbsent(CaffeineLookupCache.java:75) ~[commons-utils-24.19.jar:24.19]
at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:1523) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._fromParamType(TypeFactory.java:1632) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._fromAny(TypeFactory.java:1395) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._resolveSuperClass(TypeFactory.java:1534) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:1493) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._fromParamType(TypeFactory.java:1632) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._fromAny(TypeFactory.java:1395) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._resolveSuperClass(TypeFactory.java:1534) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:1493) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory._bindingsForSubtype(TypeFactory.java:531) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.type.TypeFactory.constructSpecializedType(TypeFactory.java:510) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.SerializerProvider.constructSpecializedType(SerializerProvider.java:347) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ser.BeanPropertyWriter._findAndAddDynamic(BeanPropertyWriter.java:898) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.module.blackbird.ser.ObjectPropertyWriter.serializeAsField(ObjectPropertyWriter.java:75) ~[jackson-module-blackbird-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:774) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:178) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.module.blackbird.ser.ObjectPropertyWriter.serializeAsField(ObjectPropertyWriter.java:95) ~[jackson-module-blackbird-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:774) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:178) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1572) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ObjectWriter._writeValueAndClose(ObjectWriter.java:1273) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsBytes(ObjectWriter.java:1163) ~[jackson-databind-2.14.2.jar:2.14.2]
...
I must admit, I'm kinda clueless about what to do with this so any hints would be greatly appreciated, thanks!
It’s strange as that usually occurs due to long computations which might block an eviction from a neighboring entry in the hash map. It could be if you flood ForkJoinPool.commonPool() so it’s not evicting regularly, though writing should barge in. You can try executor(Runnable::run)
and see if that helps.
sorry for late reply, didn't have time to properly look at the issue. what do you actually mean by
You can try executor(Runnable::run) and see if that helps.
I use caffeine for jackson ObjectMapper TypeFactory cache
it appears to be happening only on an instance startup for some time - longest I've seen was ~40 minutes of warnings in logs, ard 9k log entries. Then it stops and never reappears again. So something must be overloading the cache at the instance startup
@Enigo overload during startup makes lots of sense, since that is also when (de)Serializers get constructed, big part of which is introspection of classes, methods, fields, constructors and all relevant types. And this tends to not happen later on when all/most (de)serializers are then cached and reused.
Although earlier discussion did suggest that not all serializers are fully cached (possibly due to "wider" polymorphism for values being serialized than what the standard BeanSerializer
caching expects).
But the pattern itself make some sense.
Search before asking
Describe the bug
Hello everyone! There is a curious issue I want to share regarding the
TypeFactory::_typeCache
and its underlying map.But first a bit of context. We run a high-load servers processing lots requests (300k+) per second, which translates to 5-6k rps per single machine in a cluster. Those requests are all json, so we do lots of serialization and deserialization operations, here is a very simplified flow:
ObjectMappers
- for serialization and deserialization - created once and shared across the entire application.Now onto the issue. We are running in AWS on 4x large machines and recently decided to migrate to the bigger 16x ones. That ofc means that every single instance will be processing 4-5x more rps. And with that our app couldn't scale and after running a few flamegraphs everything points to
LRUMap
. I attached two images - they are from the same flamegraph one for incoming and another for outgoing request processingand lastly the lock time
I've been looking around the internals of jackson lib and it seems to me that in our particular case the cache used in
TypeFactory
is not scaling with the instance size. Basically, when running bigger instances what happens is the newly launched instances have all available threads busy for like 2-5 minutes which pretty much make the entire cluster sorta "dead".I've tried a couple of optimizations that seemed to mitigate the issue, but it is still spending quite a lot of time waiting:
ObjectWriter
andObjectReader
instead ofObjectMapper
ObjectMapper
for each of those convertes from point (3)objectMapper.setTypeFactory(TypeFactory.defaultInstance().withCache(new LRUMap<>(10000, 10000)));
from code POV what we run are simple
objectWriter.writeValueAsBytes
andobjectReader.readValue
operations andObjectMapper
is configured like thisEverywhere it is stated that
ObjectMapper
is thread-safe, which is true, but if this is indeed an expected behavior that thread-safety comes with quite a price tag it seems.Appreciate any feedback and hints on how to solve this, thanks!
Version Information
2.14.2
Reproduction
<-- Any of the following
Expected behavior
No response
Additional context
No response