FasterXML / jackson-jaxrs-providers

Multi-module project that contains Jackson-based "old" JAX-RS (ones under `javax.ws.rs`) providers for JSON, XML, YAML, Smile, CBOR formats
Apache License 2.0
109 stars 77 forks source link

`ProviderBase` class shows contention on synchronized block using `LRUMap` _writers instance #166

Closed jllanes3 closed 1 year ago

jllanes3 commented 1 year ago

Libraries I am using:

io.dropwizard:dropwizard-core:2.14.0
jackson-jaxrs-base-2.13.1.jar

I am seeing thread contention on the following line of ProviderBase.java.

image

Thread that locked the _writers instance:

"pool-abc-3846" #3846 prio=5 os_prio=0 tid=0x00007f75f01ee800 nid=0xfd4 runnable [0x00007f74de6e9000]
   java.lang.Thread.State: RUNNABLE
    at java.lang.reflect.WeakCache.containsValue(WeakCache.java:175)
    at java.lang.reflect.Proxy.isProxyClass(Proxy.java:791)
    at sun.reflect.annotation.AnnotationInvocationHandler.asOneOfUs(AnnotationInvocationHandler.java:225)
    at sun.reflect.annotation.AnnotationInvocationHandler.equalsImpl(AnnotationInvocationHandler.java:201)
    at sun.reflect.annotation.AnnotationInvocationHandler.invoke(AnnotationInvocationHandler.java:64)
    at com.sun.proxy.$Proxy56.equals(Unknown Source)
    at com.fasterxml.jackson.jaxrs.cfg.AnnotationBundleKey._equals(AnnotationBundleKey.java:135)
    at com.fasterxml.jackson.jaxrs.cfg.AnnotationBundleKey.equals(AnnotationBundleKey.java:116)
    at java.util.HashMap.getNode(HashMap.java:573)
    at java.util.LinkedHashMap.get(LinkedHashMap.java:440)
    at com.fasterxml.jackson.jaxrs.base.ProviderBase._endpointForWriting(ProviderBase.java:681)
    - locked <0x00000003c2a0f580> (a com.fasterxml.jackson.jaxrs.util.LRUMap)
    at com.fasterxml.jackson.jaxrs.base.ProviderBase.writeTo(ProviderBase.java:557)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:242)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:227)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
    at io.dropwizard.jersey.errors.EofExceptionWriterInterceptor.aroundWriteTo(EofExceptionWriterInterceptor.java:39)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
    at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:85)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
    at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:61)
    at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
    at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1116)
    at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:642)
    at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:373)
    at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:363)
    at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$3.run(ServerRuntime.java:864)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
    at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:896)
    at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:852)
        ...

Thread (1 of many!) that is waiting for such lock:

    "pool-abc-3478" #3478 prio=5 os_prio=0 tid=0x00007f768c903000 nid=0xe3b waiting for monitor entry [0x00007f74f4343000]
    java.lang.Thread.State: BLOCKED (on object monitor)
     at com.fasterxml.jackson.jaxrs.base.ProviderBase._endpointForWriting(ProviderBase.java:680)
     - waiting to lock <0x00000003c2a0f580> (a com.fasterxml.jackson.jaxrs.util.LRUMap)
     at com.fasterxml.jackson.jaxrs.base.ProviderBase.writeTo(ProviderBase.java:557)
     at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:242)
     at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:227)
     at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
     at io.dropwizard.jersey.errors.EofExceptionWriterInterceptor.aroundWriteTo(EofExceptionWriterInterceptor.java:39)
     at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
     at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:85)
     at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
     at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:61)
     at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139)
     at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1116)
     at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:642)
     at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:373)
     at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:363)
     at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$3.run(ServerRuntime.java:864)
     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
     at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
     at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
     at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
     at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
     at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:896)
     at org.glassfish.jersey.server.ServerRuntime$AsyncResponder.resume(ServerRuntime.java:852)
         ....

JMC for a 1 min JFR:

image

Full code:

    protected EP_CONFIG _endpointForWriting(Object value, Class<?> type, Type genericType,
            Annotation[] annotations, MediaType mediaType, MultivaluedMap<String,Object> httpHeaders)
    {
        // 29-Jun-2016, tatu: As per [jaxrs-providers#86] allow skipping caching
        if (!isEnabled(JaxRSFeature.CACHE_ENDPOINT_WRITERS)) {
            return _configForWriting(locateMapper(type, mediaType), annotations, _defaultWriteView);
        }

        EP_CONFIG endpoint;
        AnnotationBundleKey key = new AnnotationBundleKey(annotations, type);
        synchronized (_writers) {
            endpoint = _writers.get(key);
        }
        // not yet resolved (or not cached any more)? Resolve!
        if (endpoint == null) {
            MAPPER mapper = locateMapper(type, mediaType);
            endpoint = _configForWriting(mapper, annotations, _defaultWriteView);
            // and cache for future reuse
            synchronized (_writers) {
                _writers.put(key.immutableKey(), endpoint);
            }
        }
        return endpoint;
    }

Are you aware if this is expected say when moving to Dropwizard async request handling (i.e. using AsyncResponder.resume(...))?

If this is by design, are you aware of any workarounds when using Dropwizard + Jersey + Jackson (in async mode) that could help with this contention?

pjfanning commented 1 year ago

@cowtowncoder since jackson v2.14 and the changes in LRUMap to use a ConcurrentLinkedHashMap implementation, the synchronization around _writers and _readers in ProviderBase may not be needed.

cowtowncoder commented 1 year ago

@pjfanning You are absolutely right -- since implementation is synchronized, sync here should not be needed (and ditto for jackson-jakarta-rs-providers).

cowtowncoder commented 1 year ago

Ok, fixed via #167, to be included in 2.14.2 / 2.15.0.

jllanes3 commented 1 year ago

Thanks for fixing this super fast. Do you know when we could expect to see 2.14.2 out? I really need the fix, and I want to plan around the release date if possible.

cowtowncoder commented 1 year ago

I am hoping to get 2.14.2 released in couple of weeks; might be very soon but cannot promise due to fluctuating time I have: each full patch release takes 2.5-3 hours of my time.

e-hubert-opti commented 6 months ago

@cowtowncoder I think the assumption the used LRUMap to be thread-safe because being backed by a concurrent implementation was not correct here and causes unbound growth of this map. Could it be that you confused com.fasterxml.jackson.jaxrs.util (used by ProviderBase) with com.fasterxml.jackson.databind.util.LRUMap? Am going to file a bug against Jersey 2.41, but it looks like they only ported this change into their code base. So likely someone should also file an issue with Jackson.

cowtowncoder commented 6 months ago

@e-hubert-opti Whoa! You are right: I had forgotten about copy-pasted version of LRUMap. Yes, I think we need to change JAX-RS/Jakarta-RS providers to use jackson-databind version, which was rewritten in 2.14.

cowtowncoder commented 6 months ago

Fixed:

e-hubert-opti commented 6 months ago

Thank you very much for this ultra fast response @cowtowncoder. I adjusted my unit test to validate the behavior of the LRUMap implementation and changed the import to com.fasterxml.jackson.databind.util.LRUMap (version 2.16.0) and it also fails:

@Test
void test_concurrent_usage_of_lru_map() {
    LookupCache<Integer, Integer> lruMap = new LRUMap<>(16, 120);
    ExecutorService executorService = Executors.newFixedThreadPool(10);

    for (int i = 0; i < 100000; i++) {
        int tmp = i;
        executorService.submit(() -> lruMap.put(tmp, tmp));
    }

    assertThat(lruMap.size()).isLessThanOrEqualTo(120);
}

java.lang.AssertionError: Expecting actual: 5242 to be less than or equal to: 120

Synchronizing the map with Collections.synchronizedMap() turns it green. I will try to have a closer look at the LRUMap implementation of jackson-databind, but wanted to quickly provide feedback.

e-hubert-opti commented 6 months ago

Ah, all good. I had a look in the implementation of PrivateMaxEntriesMap and noticed the async draining, so it is just the test which needs to be adjusted for this.

cowtowncoder commented 6 months ago

Yeah older versions of LRUMap suffered from various problems; original (one copied here) was not thread-safe wrt mutations; next variation was safe, had flush but no actual LRU logic. Latest version is bit of an overkill, but should work correctly and efficiently, even if not very memory efficiently.