eclipse-vertx / vert.x

Vert.x is a tool-kit for building reactive applications on the JVM
http://vertx.io
Other
14.25k stars 2.06k forks source link

Contex data didn't clean before return to thread pool #4177

Open glassfox opened 2 years ago

glassfox commented 2 years ago

Context

All data that previously has been persisted in the context, available in next iteration of thread.

Following test:

import java.util.Objects;
import java.util.UUID;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicInteger;

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;

import io.vertx.core.AbstractVerticle;
import io.vertx.core.Vertx;
import io.vertx.core.impl.ContextInternal;
import io.vertx.junit5.VertxExtension;
import io.vertx.junit5.VertxTestContext;

@ExtendWith(VertxExtension.class)
public class WorkingContextTest {

    private static final String CONTEXTUAL_KEY = "val";

    @Test
    void test(Vertx vertx, VertxTestContext ctx) {
        vertx.deployVerticle(new AbstractVerticle() {
            @Override
            public void start() throws Exception {
                AtomicInteger cnt = new AtomicInteger();
                vertx.setPeriodic(100, id -> {
                    var actual = get(CONTEXTUAL_KEY);
                    if(null == actual) {
                        var expected = UUID.randomUUID().toString();
                        put(CONTEXTUAL_KEY, expected);

                        if(cnt.incrementAndGet() > 100) {
                            ctx.completeNow();
                        }
                    } else {
                        vertx.cancelTimer(id);
                        ctx.failNow("context not empty. value:" + actual);
                    }
                });
            }
        });
    }

    private static String put(String key, String value) {
        Objects.requireNonNull(key);
        Objects.requireNonNull(value);
        ContextInternal ctx = (ContextInternal) Vertx.currentContext();
        if (ctx == null) {
            return null;
        } else {
            return contextualDataMap(ctx).put(key, value);
        }
    }

    public static String get(String key) {
        Objects.requireNonNull(key);
        ContextInternal ctx = (ContextInternal) Vertx.currentContext();
        if (ctx != null) {
            return contextualDataMap(ctx).get(key);
        }
        return null;
    }

    private static ConcurrentMap<String, String> contextualDataMap(ContextInternal ctx) {
        Objects.requireNonNull(ctx);
        return (ConcurrentMap<String, String>) ctx.localContextData().computeIfAbsent(WorkingContextTest.class, k -> new ConcurrentHashMap<String,String>());
    }
}
vietj commented 2 years ago

that is expected with setPeriodic

local context data is never cleaned, it is just that sometimes we create a new context e.g with HTTP request.

perhaps the same should apply with setPeriodic.

glassfox commented 2 years ago

@vietj After short investigation I found the same behavior of "durty" context in inboundInterceptor event of eventBus.

vietj commented 2 years ago

it's not dirty, such context with local data are expected to be garbaged by the JVM

On Sun, Nov 28, 2021 at 3:31 PM glassfox @.***> wrote:

@vietj https://github.com/vietj After short investigation I found the same behavior of "durty" context in inboundInterceptor event of eventBus.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vert.x/issues/4177#issuecomment-981095471, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCUR7IU2NH7EBA6T5L3UOI4NDANCNFSM5IWVDFZA .

glassfox commented 2 years ago

In my case "durty" context behavior create unexpected activity of contextual tracing.

dano commented 2 years ago

FWIW, I was also bitten by this when I started making use of the local map in my codebase. In general, I found lots of bugs where data in the context got mismatched because of accidentally putting local data in a Context instance that was not a duplicate. Now, one option is to have all the callbacks for setTimer/setPeriodic run in their own duplicated context (which IMO is the behavior of least surprise from a user's perspective), but I have a more general concern with using putLocal/getLocal with non-duplicated contexts.

I know that Quarkus now provides a wrapper API (as Clement explained to me here ) that actually prevents you from doing this - it throws an error if the Context you are trying to store local data in is not a duplicate. I wonder if Vert.x itself should consider some kind of protections here, as well? Are there valid use-cases for using putLocal/getLocal with a non-duplicated Context? I guess it has the advantage of the local map not being inherited by duplicates created from the original Context, but I'm not sure what practical use that has.

I'm not sure completing blocking calls to those APIs make sense, but my suspicion is that most times code is calling putLocal on a non-duplicated Context, it's an accident. Maybe just put a warning in the logs (that can be turned off, I guess?) when someone tries it, or maybe the default behavior is to throw an error, but there's an override option that lets you force it to happen if you have a good reason to store local data in a shared Context? Other ideas? Or maybe it's just up to the user to figure this out for themselves, and it's not a big issue if Vert.x makes sure duplicate Context instances are used by default in more places?

glassfox commented 2 years ago

After a several period of research of the impact of "dirty" context I found a wide scale of different parts/tools of the framework that behave not stable:

  1. Timer: in case of implementation of timer, is not a big issue to clean context on each call. But timer has been implemented in different built-In tools (like CircuitBreaker). Timer has been executed in short period and publish metrics via eventbus. Lot of "dirty" context data has been moved and generate irrelevant activity. No option to clean context, just disable metrics publish. I think that exist more tools that use Timer inside, and required some workaround to prevent activity or clean context. Currently as solution I have duplicated classes: ContextualTimer, ContextualCircuitBeaker, etc.
  2. Resource clients with thread pool (Redis, Sql, etc.): Not sure about all modules, but I found inconsistent behavior of RedisClient for "dirty" Context. Validation of context before and after "batch" action (in my case) hardly ever, but not the same. Currently I spread context validation in a lot of clients (ElasticSearch, Http, Kafka, Redis, Sql) to understand the impact.

Finally: The "dirty" context take a lot of time for investigation and stability of dataflow. Hope that we will find together a simple quick and bulletproof solution for this.