DataDog / dd-trace-java

Datadog APM client for Java
https://docs.datadoghq.com/tracing/languages/java
Apache License 2.0
573 stars 285 forks source link

Document how log injection works #3522

Open strowk opened 2 years ago

strowk commented 2 years ago

We are using logback and slf4j, plus logstash to output JSON in our logs. Lately I was working on adding trace id injection.

Reading docs here: https://docs.datadoghq.com/tracing/connect_logs_and_traces/java/?tab=slf4jandlogback it looked like it would be very simple, however in practice this turned out to be much more complicated and required reading sources of this library. I believe the reason is lack of documentation on subject.

First of all "Starting in version 0.74.0, the Java tracer automatically injects trace correlation identifiers into logs" is not really true, because we still have to write code which would take trace and span id from MDC and put it to json output. Mostly because we (and probably a lot of other projects) already have some customizations there, so whatever agent is doing that is supposed to work automatically - did not really worked.

Then I looked at "Manual injection" paragraph and figured that I could just make my provider taking trace from MDC and placing to JSON. This looked like this (in Kotlin):

class DatadogProvider : AbstractFieldJsonProvider<ILoggingEvent>() {

    override fun writeTo(generator: JsonGenerator, event: ILoggingEvent) {
        val traceId = MDC.get("dd.trace_id")
        traceId?.let {
            generator.writeStringField("trace_id", it)
        }
        val spanId = MDC.get("dd.span_id")
        spanId?.let {
            generator.writeStringField("span_id", it)
        }
    }

}

And did in fact worked up to version of agent 0.76.0 , where this change: https://github.com/DataDog/dd-trace-java/pull/2440 broke this approach.

Interestingly, changelog does not even list this as a breaking change (which it of course is). It is listed as "Log injection enabled by default", while PR itself is named "Log injection rework", which is a very big difference.

Reading the code of the change I figured that I need to refer to getMDCPropertyMap from ILoggingEvent rather than to MDC.

Now I had to rewrite my provider like this:

class DatadogProvider : AbstractFieldJsonProvider<ILoggingEvent>() {

    override fun writeTo(generator: JsonGenerator, event: ILoggingEvent) {
        // It is important to use event.mdcPropertyMap, which is instrumented by
        // Datadog java agent, rather than MDC.get, which is not
        val traceId = event.mdcPropertyMap["dd.trace_id"]
        traceId?.let {
            generator.writeStringField("trace_id", it)
        }
        val spanId = event.mdcPropertyMap["dd.span_id"]
        spanId?.let {
            generator.writeStringField("span_id", it)
        }
    }

}

, which appears to be working with last version.

I believe that documentation lacks any details on how this works and it must be expanded a lot.

bantonsson commented 2 years ago

Hi @strowk, it seems there are a few things that are mixed up here.

  1. The tracer automatically injects the correlation identifiers into the log message via the MDC, but it has no way of knowing how your log appender wants to write that information, i.e. that's why you need to provide a pattern that contains the trace_id and span_id with the raw format, or pull out the information in your own code if you don't use a standard json appender.
  2. The manual injection code you reference is just that, manual injection of trace and span id, while what you need to write things out is accessing these from the MDC. The fact that the thread local that represents the MDC happened to work in your appender code up until 0.76.0, if it really worked correctly for an async appender, was not by design. There are no guarantees that the processing of an ILoggingEvent would happen synchronously on the thread that logged the message, which would be needed for the thread local MDC to be correct.

I think that examples on how the internals of different logging frameworks work, and how to write a custom log appender is out of scope for the Datadog documentation.

randomanderson commented 2 years ago

Starting in version 0.74.0, the Java tracer automatically injects trace correlation identifiers into logs

This part of the documentation is referring to the fact that up until 0.74.0, you needed to set dd.log.injection=true. After, 0.74.0, log injection is by default on. We test 27+ different logging combinations (https://github.com/DataDog/dd-trace-java/pull/2368) with multiple versions of each library (https://github.com/DataDog/dd-trace-java/blob/master/dd-smoke-tests/log-injection/log-injection.gradle). Log injection is automatic for the vast majority of users.

For your specific setup: With Slf4j + logback + logstash encoder you should not need to write your own provider. By default, the MdcJsonProvider, writes all fields in the event's MDC to the generated json. If you don't want the entire MDC to be written, you can set includeMdcKeyNames to dd.trace_id, dd.span_id, dd.service, dd.env, and dd.version

strowk commented 2 years ago

Hi @bantonsson , thanks for quick answer!

it has no way of knowing how your log appender wants to write that information

I guessed that, but documentation states "Java tracer automatically injects trace correlation identifiers into logs", which kinda contradicts this. Takes a lot of work to figure out what to do to make it work right now, while documentation makes it sound easy.

The fact that the thread local that represents the MDC happened to work in your appender code up until 0.76.0, if it really worked correctly for an async appender, was not by design

Please document what is "design". Right now design seems to only exist in code or in knowledge of developers of the tracer. That is not how it should be. If tracer is injecting tags to MDC and developer supposed to work with it, there should be explanation of that, because it is basically API between tracer and developer, who is using it. There is no information about how MDC.get is not supposed to work and event.getMdcPropertyMap is supposed to work.

examples on how the internals of different logging frameworks work, and how to write a custom log appender is out of scope for the Datadog documentation

I disagree. Documentation have "Connecting Java Logs and Traces" and tab "SLF4J and Logback", which I believe are beginnings of full correct documentation. And again, tracer have an API through which it exposes the tags. How is documenting this API out of scope? Making this final step is something each developer using this system has to at some point perform. If there is no clear explanation of how tracer provides trace_id to logging framework, everyone would have to read code and guess it for themselves, which requires knowledge of how java agent works at least. This is what documentation is there to avoid - so that it would be easier to start using this tracing thing.

strowk commented 2 years ago

Thanks for examples @randomanderson

With Slf4j + logback + logstash encoder you should not need to write your own provider. By default, the MdcJsonProvider, writes all fields in the event's MDC to the generated json

Is there a documentation explaining how I can use MdcJsonProvider to configure datadog log injection? We list all used providers explicitely in logback.xml currently, I had no idea that using that one is the right way to go, that is why had to write my own. If this would've been documented somewhere, that'd be great!

arlyon commented 2 years ago

I have also spent the last 6 or so jumping between bad documentation only to stumble on this. For the record, here is my xml since I'm not sure how to use MdcJsonProvider. Weirdly, service, and and version are being injected automatically but these arent...

<configuration>
  <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
      <includeMdcKeyName>dd.trace_id</includeMdcKeyName>
      <includeMdcKeyName>dd.span_id</includeMdcKeyName>
    </encoder>
  </appender>
  <root level="INFO">
    <appender-ref ref="CONSOLE"/>
  </root>
</configuration>
s7an-it commented 1 year ago

For me with Spring/Logback correlation doesn't happen automatically. After months of fighting this we have set filterchain to push correlation id and then through MDC and eventually pattern. So is there any way for me to avoid this?

kvjrhall commented 1 year ago

@s7an-it do you think you could post an example of doing this? Having the same problem (logstash encoder + logback + spring boot => everything except dd.trace_id and dd.span_id). Using spans generated by both dd-agent-java and manual opentelemetry instrumentation.

For others in the future, the support for this comes from OpenTelemetry's Logback appender. You wrap your existing appender with it, and the missing MDC attributes are injected (documentation).