Question: MDC and concurrency

narerkrit-dotography commented 6 years ago

Hi,

I'm implementing a logging library for my team to use, mainly to handle log metadata (correlation id and the like). Most google results seem to point towards using MDC. I understand that slf4j's MDC is thread local, so there'll be issues while using it in concurrent processes.

Here are my questions:

Does log4s's MDC.withCtx provide any help regarding concurrency? Is there any recommended pattern to deal with MDC and concurrency, specific to log4s?

I'm trying to write a few tests to find out the answer for myself, but if I could get some answers from here before I'm done, that would greatly help me out.

Thank you.

sarahgerweck commented 6 years ago

Your options are basically either to use an MDC or to have some kind of mechanism that constructs custom log messages with a given format.

The MDC system is actually implemented not by Log4s or SLF4J, but by the underlying log framework that you choose (e.g., Logback, Log4j or Log4j2). I'm most familiar with Logback, and it uses thread-local storage without inheritance. This means that when code spawns a new thread, that new thread has an empty MDC instead of inheriting the MDC of the code that spawned it.

If you're working in Scala or a modern Java architecture, this non-inheritance of the MDC is probably critical. You're probably not spawning your own threads using something like new Thread: you probably are working with actors or existing executor pools. In this scenario, inheritance of thread-local variables is a disaster because you might write code like this:

MDC("requestId") = "12345"
try {
  val job = doSomeWork()
  logger.debug(s"Scheduling new job: $job")
  /* do some work  */
  executor.submit(job)
} finally {
  MDC -= "requestId"
}

You might think that this try/finally pairing guarantees that you'll correctly clean up the MDC—and this is true without thread-local inheritance. However, the way thread pools work is that submitting a job will trigger the creation of a new thread if the pool needs capacity. The submitted job may or may not run on that new thread. If you have thread-local inheritance, a random new thread will permanently have the requestId of the code that happened to trigger the pool's expansion.

So the short of it is that inheritance of thread-local variables is basically incompatible with modern threading architectures on the JVM where you're using anything from java.util.concurrent, Akka or scala.concurrent. Modern versions of Logback do not use inheritance: you should verify this for yourself if you're using a different library. (If you happen to check, I'd appreciate a note so I can add a table to the README.)

When dealing with the MDC in concurrency, you should always do something like the code above where you set the MDC, do some work, and then clear the MDC. This is precisely what withCtx does for you, so yes, I recommend using withCtx in most cases. The only case where I wouldn't use it is if you have complicated flow control in a tight loop, as setting and restoring the MDC adds some overhead that could be problematic if you do it billions of times.

If you're using Futures, you'll need to set the MDC within any asynchronous code block, as they run outside the main thread and cannot safely inherit thread-local variables.

E.g., you would have to write code that looks like this:

MDC.withCtx("requestId" -> rid) {
  logger.info(…)
  Future {
    MDC.withCtx("requestId" -> rid) {
      …
    }
  } flatMap { r1 =>
    MDC.withCtx("requestId" -> rid) {
      …
    }
  } map { r1 =>
    MDC.withCtx("requestId" -> rid) {
      …
    }
  } recover { case t =>
    MDC.withCtx("requestId" -> rid) {
      …
    }
  }
}

All of those context-set points are required if you want the MDC to be available throughout the full code, as each of the five blocks is effectively in a different thread. (They're async callbacks running in some thread pool. There's a good chance they'll actually run on the same physical thread, but the framework can't rely on that in any way.)

I have been contemplating a couple of potential new APIs that would decouple the MDC from thread-local storage. There are two ways this might work.

Option 1: pinned-context logger

val logger = getLogger
val reqLogger = logger.withMDC("requestId" -> rid)
reqLogger.info(…)
Future {
  reqLogger.info(…)
}

Option 2: implicit-context logger

val logger = getImplicitLogger
implicit val mdc = MDC.implicit("requestId" -> rid)
logger.info(…)
Future {
  logger.info(…)
}

I'm far more likely to do option number two, because it's more powerful and implicits are the natural way to represent contextual state like this. The implementation of either of these would effectively set the underlying thread-local MDC on a per-message basis and immediately restore it afterwards. This imposes some overhead, but users could still use manual MDC management if they wanted.

One thing I would caution: writing your own version of Logger that wraps it has some pitfalls and you need to be careful about performance. (Most importantly, you need to make sure that any operations remain inexpensive when the logger is disabled, or you can turn disabled trace logging into a serious bottleneck.) If you're just trying to provide a uniform way to inject a diagnostic context, you may want to just provide a standard object that people can use to get the MDC state they need.

E.g.,

trait HasLogContext {
  protected[this] def context: Seq[(String, String)]
  def withCtx[A](f: => A): A = MDC.withCtx(context: _*)(f)
}
class ClientRequest(id: String) extends HasLogContext {
  override protected[this] def context = Seq("reqId" -> id)
}
object Main {
  def handle(req: ClientRequest) = req.withCtx {
    /* Use log4s directly in here */
  }
}

I'll probably evolve some of this into supplementary documentation, but hopefully this was useful.

narerkrit-dotography commented 6 years ago

Wow. That was above and beyond. Thank you very much.

narerkrit-dotography commented 6 years ago

Side note, I'm also planning to use Logback implementation.

Log4s / log4s

Question: MDC and concurrency #21