Improve logging context integration

rocketraman commented 6 months ago

Using log4j2 context seems a bit limited for non-thread use cases. The assumption that context will always be set at the Thread or coroutine level is IMO false, and sometimes we just want to set context for a particular block of code, or for a particular logger.

We should have a way to do something like this outside of a coroutine / suspending context:

withContextMap(...) {
  // log something
}

Possible implementations (only map shown here, but similar for stack)?

fun <T> withContextMap(
  map: Map<String, String>,
  block: () -> T,
): T {
  val oldMap = ContextMap.view
  return try {
    ContextMap += map
    block()
  } finally {
    ContextMap.clear()
    ContextMap += oldMap
  }
}

Could also be related to https://github.com/apache/logging-log4j-kotlin/issues/36. We may want to do something like:

val log = parentLog.withContext(...)

...

log.debug { ... }

vy commented 6 months ago

I think this is related to apache/logging-log4j2#2213.

rocketraman commented 6 months ago

I think this is related to [apache/logging-log4j2#2213].

@vy It seems so, but I'm unsatisfied by the answers in the discussion.

Let me see if I can illustrate my use case with a generic example.

Assume we have this setup:

class T {
  val log = logger()

  val r: R

  fun doSomethingInThread() {
    r.doSomethingOnResource()
    log.info { "bar" }
  }
}

class R {
  val log = logger()

  fun doSomethingOnResource() {
    log.info { "foo" }
  }
}

T represents some kind of thread. It could be an HTTP request thread for example. So we have some diagnostic context related to that thread, say a request id.

R represents some kind of resource used by T. It could be a state machine or a physical resource or whatever. There is some diagnostic context related to this resource, say a resource name or id.

Inside doSomethingOnResource when logging "foo", both the thread and resource diagnostic contexts are relevant.

However, inside doSomethingInThread i.e. the "bar" log, only the thread context is relevant.

If doSomethingOnResource were to set the context for the resource in the ThreadContext before the foo log, then the bar log would also show the resource context, which would be incorrect.

The thread or coroutine attached context works well for T but doesn't work well at all for R, which is used within the thread of T.

vy commented 6 months ago

@rocketraman, AFAIC, the problem is the lack of a dynamically scoped variable (you might be familiar with this concept if you have ever used a Lisp dialect), and hence, we use the one closest to that: a thread-local one. Imagine ThreadContext is a ScopedValue. Your withContextMap(map: Map<String, String>, block: () -> T) could simply be represented as ThreadContext.runWhere("mdcKey", "mdcVal", doStuff()). Am I missing something?

rocketraman commented 6 months ago

Am I missing something?

@vy Yes, I think so. Or I am :-)

If you look back at my example, logging in R is not (only) thread-scoped, it is both thread-scoped AND class-scoped. In other words, I want all logging in R to show the specific context of R in addition to the thread-scoped context that is executing the code.

rocketraman commented 6 months ago

I think the ideal solution for this would look something like this, and could leverage the solution we come up with for https://github.com/apache/logging-log4j-kotlin/issues/36:

class R {
  val log = logger().withContext(...)

  fun doSomethingOnResource() {
    log.info { "foo" }
  }
}

rgoers commented 6 months ago

Please see #2385 as this should do what you want - for Java. I am not sure how to incorporate this into Kotlin.

You can now do:

ScopedContext.newInstance().where("key1", "value1").where("key2", "value2").run(() -> {
    logger.debug("This will include key1 and key2 in the ThreadContextMap of the LogEvent");
 });

or you can do

boolean result = ScopedContext.newInstance().where("key1", "value1").where("key2", "value2").call(() -> {
    logger.debug("This will include key1 and key2 in the ThreadContextMap of the LogEvent");
    return true;
 });

In addition, nesting is supported as are arbitrary objects. To include the key/value pairs of the parent you would do:

ScopedContext.newInstance().where("user", userObject).run(() -> {
   ScopedContext.newInstance(true).where("key1", "value1").run() -> {
          logger.debug("This will include the String representation of the user and key1");
    });
});

If the object implements the Renderable interface then it can control the formatting of the object. Otherwise, toString() is used.

ppkarwasz commented 6 months ago

If you look back at my example, logging in R is not (only) thread-scoped, it is both thread-scoped AND class-scoped. In other words, I want all logging in R to show the specific context of R in addition to the thread-scoped context that is executing the code.

There are already two sources for context data:

the contents of the thread-bound context map,
the contents of the <Property> elements bound the logger in the configuration:

<Logger name="resource">
    <Property name="resourceId" value="abcd"/>
</Logger>

Is this what you are missing?

rocketraman commented 6 months ago

@rgoers I think you missed the class-scoped (not thread-scoped) context in my example.

@ppkarwasz Ah interesting -- if I could bind Property dynamically then yes, that would work. For example, lets say I have 3 instances of class R, all with different values of resourceId. They all use the same logger name, but each have a different resourceId. I haven't looked deeply but it doesn't seem there is any API for this.

rgoers commented 6 months ago

@rocketraman I guess I am not grasping at all what you want. With my PR you can now do:

class T {
  private static final Logger log = LogManger.getLogger(T.class);

  R r = new R();

  doSomethingInThread() {
   ScopedContext.newInstance().where("context", "thread").run(() -> {
          r.doSomethingOnResource();
          log.info{ "bar");
      });
  }
}

class R {
 private static final Logger log = LogManger.getLogger(R.class);

  fun doSomethingOnResource() {
    log.info("context will be 'thread' here);
    ScooedContext.newInstance().where("context", "resource").run(() -> {
        log.info { "foo" };
    )};
  }
}

In this example the value of "context" when bar is logged would be "thread" and would be "resource" when foo is logged.

so everything inside the run method, and only inside the run method contains the data. In the example above the value of context will vary depending on which ScopedContext you are executing in. Isn't this what you wanted?

Note, that while the PR uses a ThreadLocal it has to do that to make it thread-safe. Each Thread has its own stack of contexts.

Now it would be great if instead you could annotate the class or method with the scope but due to their limitations I am not sure how you could actually populate the context since properties associated with annotations have to be known at compile time.

rocketraman commented 6 months ago

@rgoers Yes, in my original post I proposed a method withContextMap that would be used in Kotlin the same way as ScopedContext (though its a bit nicer in Kotlin).

Now it would be great if instead you could annotate the class or method with the scope but due to their limitations I am not sure how you could actually populate the context since properties

I think with some kind of LogBuilder-type API this could be possible.

rgoers commented 6 months ago

@rocketraman Again, I don't understand. ScopedContext IS a builder. Each where method adds a new key/value pair to the context. The run and call methods are, in essence, the builders. Of course, that isn't strictly true since newInstance creates the object, but the ScopedContext cannot reallly be used for anything until run or call are called.

I also don't see how a builder could be leveraged from an Java annotation.

Note: I very loosely used Java 21 ScopedValues for the API for this, although ScopedContext is quite a bit simpler IMO. The where method in scoped variables creates a new immutable ScopedValue.Carrier - I believe the JDK has gone a bit overboard in making things immutable.

ppkarwasz commented 6 months ago

I think we should discuss it in dev@logging. I started a thread dedicated to this subject.

rocketraman commented 6 months ago

Again, I don't understand. ScopedContext IS a builder.

@rgoers ScopedContext is a builder for the context. I was talking about a builder for the logger. The thread that @ppkarwasz started is conceptually the same as what I was talking about:

Regarding problem 1, I don't know what the best approach would be. Maybe we can add withContextData to LogBuilder and create a BoundLogger interface with methods to create LogBuilder instance pre-filled with some context data.

rgoers commented 6 months ago

OK. Personally, I see little value in adding context data to a logger. That simply wouldn't fit with any use cases I work with. However, it might if Logger names were more generic instead of using class names as is done now by convention.

rocketraman commented 6 months ago

OK. Personally, I see little value in adding context data to a logger. That simply wouldn't fit with any use cases I work with. However, it might if Logger names were more generic instead of using class names as is done now by convention.

@rgoers For context (pun intended, hah!), I've recently run into this situation in a project which manages TCP/IP connections and the related state of a networking protocol. The Connection class represents a particular network connection with a UUID and local and remote socket address (context!). It uses a standard class-name based logger.

The Connection resource class can be called from multiple threads, each of which can do some operation on the connection. There are therefore two sources of context:

1) The connection context i.e. the logger in the Connection object should contain the UUID and socket addresses in the context, which is very useful when using logs to answer the question "what is everything that has happened to this particular connection?".

2) The context would also contain information about the thread (or in my case, coroutine) that called whatever method on the connection. Which is very useful when answering the question "why did this happen to this connection?".

This is the first project in which I've used MDC extensively, and when encountering this gap with the former of the two context sources above, I was honestly surprised it didn't already have a good, simple solution. It seems like a common use case to me.

I think traditionally most developers would just include the connection context in the message of each log, and then search/grep for it. This works of course, but is brittle (easy to mess up) and far inferior to populating the context when using structured logging.

rgoers commented 6 months ago

@rocketraman I understand your example but I am not sure how it works in practice. I have to presume that the Connection class can operate on multiple Connection instances. A Logger is a singleton - there will only be one instance no matter how many Connection instances you create (to do otherwise would create an insane number of Logger instances in the system). We generally recommend it be declared as static as to do otherwise means a call is made to locate the Logger instance every time a new instance of the class is created. What this means is that you can't really have a per-resource context attached to a Logger. From inside Log4j we would have no idea which instance of the object was used and would not be able to locate the context.

However, you could create an instance of the ScopedContext I proposed and attach it to each instance of your Connection and then invoke the logic for every method inside of a run method. It should be possible to use AOP to use an annotation to cause that to happen automatically.

rocketraman commented 6 months ago

I understand your example but I am not sure how it works in practice. I have to presume that the Connection class can operate on multiple Connection instances. A Logger is a singleton - there will only be one instance no matter how many Connection instances you create (to do otherwise would create an insane number of Logger instances in the system). We generally recommend it be declared as static as to do otherwise means a call is made to locate the Logger instance every time a new instance of the class is created. What this means is that you can't really have a per-resource context attached to a Logger. From inside Log4j we would have no idea which instance of the object was used and would not be able to locate the context.

@rgoers I'm talking about a builder which is not static. So imagine if you will a static logger in the usual way. Then on top of that, you create an object-level LogBuilder which contains your object-level context, which all subsequent logging sites use as a starting point. For example in pseudo-code:

class R {
  private static final Logger baseLogger = LogManger.getLogger(R.class);

  private final id = ...
  private final logger = baseLogger.withContext(id)

  fun doSomethingOnResource() {
    logger.atInfo().log("foo");
  }
}

However, you could create an instance of the ScopedContext I proposed and attach it to each instance of your Connection and then invoke the logic for every method inside of a run method. It should be possible to use AOP to use an annotation to cause that to happen automatically.

Yes, this is what I was proposing in my OP with withContextMap.

This is an ok solution but its verbose, and I personally dislike using AOP. In my experience, AOP is almost always obviated by good architecture and APIs. Kotlin's trailing lambda syntax and delegation support makes AOP even more redundant for Kotlin users. You may disagree but I don't think log4j2 should require users to bring AOP into a project in order to have a clean API surface.

apache / logging-log4j-kotlin

Improve logging context integration #71