Elevated lock wait time pointing to PrivateMaxEntriesMap.put

Enigo commented 7 months ago

Search before asking

[X] I searched in the issues and found nothing similar.

Describe the bug

Hello everyone! There is a curious issue I want to share regarding the TypeFactory::_typeCache and its underlying map.

But first a bit of context. We run a high-load servers processing lots requests (300k+) per second, which translates to 5-6k rps per single machine in a cluster. Those requests are all json, so we do lots of serialization and deserialization operations, here is a very simplified flow:

incoming request is parsed into internal object
// some logic
internal object is converted back to multiple jsons (up to 9 at a time) in parallel
jsons are sent out to partners, partners send back the response
response is parsed into internal object
// some logic
internal object is parsed back to json and sent back to original caller we have two separate singleton ObjectMappers - for serialization and deserialization - created once and shared across the entire application.

Now onto the issue. We are running in AWS on 4x large machines and recently decided to migrate to the bigger 16x ones. That ofc means that every single instance will be processing 4-5x more rps. And with that our app couldn't scale and after running a few flamegraphs everything points to LRUMap. I attached two images - they are from the same flamegraph one for incoming and another for outgoing request processing

incoming outgoing

and lastly the lock time lock

I've been looking around the internals of jackson lib and it seems to me that in our particular case the cache used in TypeFactory is not scaling with the instance size. Basically, when running bigger instances what happens is the newly launched instances have all available threads busy for like 2-5 minutes which pretty much make the entire cluster sorta "dead".

I've tried a couple of optimizations that seemed to mitigate the issue, but it is still spending quite a lot of time waiting:

using ObjectWriter and ObjectReader instead of ObjectMapper
creating a separate ObjectMapper for each of those convertes from point (3)
objectMapper.setTypeFactory(TypeFactory.defaultInstance().withCache(new LRUMap<>(10000, 10000)));

from code POV what we run are simple objectWriter.writeValueAsBytes and objectReader.readValue operations and ObjectMapper is configured like this

        objectMapper.configure(SerializationFeature.WRITE_NULL_MAP_VALUES, false)
                    .configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false)
                    .setSerializationInclusion(JsonInclude.Include.NON_EMPTY)
                    .configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, true)
                    .configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true)
                    .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
                    .registerModule(new JavaTimeModule())
                    .registerModule(new BlackbirdModule());

Everywhere it is stated that ObjectMapper is thread-safe, which is true, but if this is indeed an expected behavior that thread-safety comes with quite a price tag it seems.

Appreciate any feedback and hints on how to solve this, thanks!

Version Information

2.14.2

Reproduction

<-- Any of the following

Brief code sample/snippet: include here in preformatted/code section
Longer example stored somewhere else (diff repo, snippet), add a link
Textual explanation: include here -->
```
// Your code here
```

Expected behavior

No response

Additional context

No response

yawkat commented 7 months ago

what does your model look like? some Object field? maybe you should try removing that

pjfanning commented 7 months ago

The LRUMap is one implementation of LookupCache. You could write your own implementation that doesn't have thread safety (eg unguarded read/writes to a HashMap). I have https://github.com/pjfanning/jackson-caffeine-cache (a Caffeine based LookupCache) which is an another alternative.

ben-manes commented 7 months ago

The LruMap’s putIfAbsent might do an optimistic get first to avoid locking. That might be enough, else swap the implementations as @pjfanning suggested (Caffeine includes that optimization).

edit: It looks like TypeFactory does a get earlier, so that optimization idea might not have much effect. It may simply be that the default cache size is too small (200 entries)?

cowtowncoder commented 7 months ago

If TypeFactory caching is truly becoming hotspot, there is likely something you can do to avoid it by eagerly resolving target type into JavaType. This bottleneck definitely should not normally occur.

I don't think it makes sense to try to use custom cache implementation, but to rather see why type resolution becomes bottleneck and see if it can be avoided.

Enigo commented 7 months ago

thanks everyone to replying!

@yawkat we don't have any Object fields, only String, Integer, Collection, List, Set, nothing fancy

@ben-manes @pjfanning in my local tests the type map never exceeds 200 entries, and I also tried with the huge size but it didn't work I'll give Caffeine a try, saw some other benchmarks with good results as well

@cowtowncoder do you mean something among those lines?

    public byte[] marshalToBytes(final Object bean) {
        try {
            final var javaType = objectMapper.constructType(bean.getClass());
            final var objectWriter = objectMapper.writerFor(javaType);
            return objectWriter.writeValueAsBytes(bean);
        } catch (final JsonProcessingException e) {
            return null;
        }
    }

or more of doing

            final var javaType = objectMapper.constructType(bean.getClass());
            final var objectWriter = objectMapper.writerFor(javaType);

once on service startup?

cowtowncoder commented 7 months ago

@Enigo latter; former wouldn't help. This assuming you know in advanced types. You could also pre-create ObjectWriters, but if the contention is wrt type resolution, retaining JavaType (which is result of type resolution) matters most.

But looking at flame graph bit more I realized it might not be that simple. Instead, it looks like some serializers (JsonSerializer) were not being properly cached. For that, pre-creating typed ObjectWriter should help, but also should not be necessary as long as you retain one ObjectMapper.

Increasing cache size or using different cache for TypeFactory seems unlikely to help I think.

yawkat commented 7 months ago

what im confused by is why you spend so much time in _findAndAddDynamic. Ideally that call should only happen a few times before it's cached. You mention you already tried a fixed ObjectReader/Writer, but the flame graph doesn't show that, so please move back to that. The only reason I can think of to repeatedly hit _findAndAddDynamic with a proper ObjectWriter would be to have weird dynamic field situations

cowtowncoder commented 7 months ago

Agree with what @yawkat said. I guess there is a possibility that some Collection (or possibly Object-valued property) has values of many different types and in-serializer dynamic caching hits limit (there's a simple linear, fixed-sized lookup data structure; but it expands to at most 8 type/serializer entries, then resets). (reason for fixed size is because it's simple linear lookup, not hash-based)

So the problem is not as much type resolution (although there are generic types being resolved as per stack traces) but re-creating value serializers, I think. Problem in itself, likely bigger (cpu-usage wise) than contention of type resolution.

Enigo commented 7 months ago

here is flamegraph of the version with a separate ObjectMapper for each of those converters from point (3). With ObjectWriter created out of ObjectMapper. And to be precise - this is a flamegraph of a single converter that takes the longest wait time

and this one is the same version but with caffeine instead both graphs show values of Lock Wait Time

I'm running a test with caffeine as I type this message and so far it actually works more stable on 16x, which I haven't seen in all my previous attempts. still, it is not perfectly smooth

still, the recorded lock wait time seems quite high. with a separate ObjectMapper per converter we now are serializing only one Java class, which is quite big tho, with many fields and nested fields. do you think it would be beneficial if we do a call objectWriter.writeValueAsBytes with an instance of this class during the converters initialization? so that the type is known and cached?

cowtowncoder commented 7 months ago

Interesting! Calling writeValuesAsBytes() (or writeValue() with Writer/OutputStream that does nothing, i.e. drops output) might help... although looking at flamegraphs, I realized that the problem spot is likely not within root value, but dynamic part of dependencies, which may or may get pre-created. Still, that should not hurt at very least.

yawkat commented 7 months ago

I'm still confused by the new flame graphs. The bulk of the time is spent in _findAndAddDynamic. aiui, this method is supposed to populate a cache, it's not going to be on the hot path because the hot path is going to use the existing cached value. Your flame graph looks like that is not what's happening, so you should investigate why that is — maybe the writer is not reused properly?

Another possibility is that your flame graph is misleading. You say the graph shows lock wait time. But if that's really all it shows, this result is entirely expected: the hot path should have zero lock wait time, so it won't show up, but the warmup may wait on locks as it populates caches. In theory this should not have a perf impact after the warmup, have you verified with async-profiler or another normal cpu sampler? The warmup may also have significant perf impact before it settles down, if you have many types hitting the same cache concurrently. Question is how much it matters for your app.

cowtowncoder commented 7 months ago

@yawkat I was guessing that the time taken is actually from call to find (and as necessary, construct, including all introspection) actual serializer. This would be heavy-weight operation.

But then again flamegraph does not seem to suggest that... which would make me wonder if it could be an artifact of some kind.

Enigo commented 7 months ago

lemme try to answer some of the questions

maybe the writer is not reused properly definitely reused, tested it thoroughly

have you verified with async-profiler or another normal cpu sampler? nope, haven't done that

The warmup may also have significant perf impact before it settles down, if you have many types hitting the same cache concurrently. Question is how much it matters for your app. that's pretty much the issue with my app - instances are getting overloaded on the startup, so it matters a lot

but good news are - with caffeine and calling writeValuesAsBytes on beans initialization we don't observe severe locking time on startup anymore, and even tho it is still there it doesn't seem to affect the instances

cowtowncoder commented 7 months ago

Thank you @Enigo. It is quite peculiar that there is this contention, but glad that combination of changes can resolve it.

I would still be interested in how much of solution is use of full Caffeine classes for cache, vs. warm-up with writeValueAsBytes() (or similar).

Enigo commented 6 months ago

ah, now there is somewhat of a different issue I added caffeine cache like this:

private static class CaffeineLookupCache<K, V> implements LookupCache<K, V> {

        private final ConcurrentMap<K, V> cache;

        CaffeineLookupCache(final int initialCapacity,
                            final int maximumSize) {
            final Cache<K, V> cache = Caffeine.newBuilder()
                                              .initialCapacity(initialCapacity)
                                              .maximumSize(maximumSize).build();
            this.cache = cache.asMap();
        }

        @Override
        public V putIfAbsent(final K key,
                             final V value) {
            return cache.putIfAbsent(key, value);
        }
   ...
}

and then using it like this

final var objectMapper = new ObjectMapper();
objectMapper.setTypeFactory(TypeFactory.defaultInstance().withCache(new CaffeineLookupCache<>(200, 1000)));

I see this warning in logs:

The cache is experiencing excessive wait times for acquiring the eviction lock. This may indicate that a long-running computation has halted eviction when trying to remove the victim entry. Consider using AsyncCache to decouple the computation from the map operation

and the stacktrace that points to ObjectWriter.writeValueAsBytes

java.util.concurrent.TimeoutException: null
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.lock(BoundedLocalCache.java:1569) ~[caffeine-3.1.8.jar:3.1.8]
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.afterWrite(BoundedLocalCache.java:1547) ~[caffeine-3.1.8.jar:3.1.8]
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:2314) ~[caffeine-3.1.8.jar:3.1.8]
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.putIfAbsent(BoundedLocalCache.java:2283) ~[caffeine-3.1.8.jar:3.1.8]
    at com.mycompany.json.CaffeineLookupCache.putIfAbsent(CaffeineLookupCache.java:75) ~[commons-utils-24.19.jar:24.19]
    at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:1523) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._fromParamType(TypeFactory.java:1632) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._fromAny(TypeFactory.java:1395) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._resolveSuperClass(TypeFactory.java:1534) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:1493) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._fromParamType(TypeFactory.java:1632) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._fromAny(TypeFactory.java:1395) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._resolveSuperClass(TypeFactory.java:1534) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:1493) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory._bindingsForSubtype(TypeFactory.java:531) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.type.TypeFactory.constructSpecializedType(TypeFactory.java:510) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.SerializerProvider.constructSpecializedType(SerializerProvider.java:347) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ser.BeanPropertyWriter._findAndAddDynamic(BeanPropertyWriter.java:898) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.module.blackbird.ser.ObjectPropertyWriter.serializeAsField(ObjectPropertyWriter.java:75) ~[jackson-module-blackbird-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:774) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:178) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.module.blackbird.ser.ObjectPropertyWriter.serializeAsField(ObjectPropertyWriter.java:95) ~[jackson-module-blackbird-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:774) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:178) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1572) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ObjectWriter._writeValueAndClose(ObjectWriter.java:1273) ~[jackson-databind-2.14.2.jar:2.14.2]
    at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsBytes(ObjectWriter.java:1163) ~[jackson-databind-2.14.2.jar:2.14.2]
        ...

I must admit, I'm kinda clueless about what to do with this so any hints would be greatly appreciated, thanks!

ben-manes commented 6 months ago

It’s strange as that usually occurs due to long computations which might block an eviction from a neighboring entry in the hash map. It could be if you flood ForkJoinPool.commonPool() so it’s not evicting regularly, though writing should barge in. You can try executor(Runnable::run) and see if that helps.

Enigo commented 5 months ago

sorry for late reply, didn't have time to properly look at the issue. what do you actually mean by

You can try executor(Runnable::run) and see if that helps.

I use caffeine for jackson ObjectMapper TypeFactory cache

it appears to be happening only on an instance startup for some time - longest I've seen was ~40 minutes of warnings in logs, ard 9k log entries. Then it stops and never reappears again. So something must be overloading the cache at the instance startup

cowtowncoder commented 5 months ago

@Enigo overload during startup makes lots of sense, since that is also when (de)Serializers get constructed, big part of which is introspection of classes, methods, fields, constructors and all relevant types. And this tends to not happen later on when all/most (de)serializers are then cached and reused.

Although earlier discussion did suggest that not all serializers are fully cached (possibly due to "wider" polymorphism for values being serialized than what the standard BeanSerializer caching expects). But the pattern itself make some sense.

FasterXML / jackson-databind