elastic / apm-agent-java

https://www.elastic.co/guide/en/apm/agent/java/current/index.html
Apache License 2.0
566 stars 320 forks source link

MDC not working in Apache Sling #1786

Closed bryantcj52 closed 3 years ago

bryantcj52 commented 3 years ago

The automatic MDC keys doesn't happen early enough for them to show up in the logs when using Apache Sling. Error.id was the only one showing up, although not every time for some reason. I had to create a custom Sling Request Filter and manually add the APM trace/transaction/span IDs to MDC.

I'm not sure if it's possible to add this here if it detects the app is running in sling.

https://gist.github.com/bryantcj52/8e5cce3d1a98d3da89b089bcf230ba61

eyalkoren commented 3 years ago

Thanks for reporting!

The automatic MDC keys doesn't happen early enough for them to show up in the logs when using Apache Sling

I am not sure how this would happen. Once we activate a transaction, the IDs are synchronously added to the MDC before the request is being handled. So, theoretically, when your filter is applied, it is already after the agent did just the same (put IDs in MDC) at the beginning of the FilterChain#doFilter() invocation as part of the transaction activation. Can you debug MdcActivationListener#before() to see what's not working as expected?

I am assuming Sling already uses slf4j by default. If not, and your logging is done directly through log4j2, note that version 1.23.0 contains a fix to a bug that prevented log correlation when using log4j2 directly.

bryantcj52 commented 3 years ago

Yes, Sling uses slf4j with logback (https://sling.apache.org/documentation/development/logging.html). I'm on the latest APM agent 1.23.0.

I set up a debug logger for co.elastic.apm.agent.mdc and even co.elastic.apm.agent but nothing is getting logged for some reason. Everything is loaded via osgi so the javaagent may be hooking in before the logging configuration? I tried adding -Dorg.slf4j.simpleLogger.defaultLogLevel=debug to the JVM options but nothing.

eyalkoren commented 3 years ago

Everything is loaded via osgi so the javaagent may be hooking in before the logging configuration?

Good direction! OSGi may be related, not so much in terms of timing, but more so in terms of visibility. Can you set log_level = debug and share the log (from startup) through https://gist.github.com/?

EDIT: this may be related to timing indeed - if we cache our MDC MethodHandle before the logging OSGi bundle is initialized. However, the Apache Sling documentation says: "Logging in Sling is supported by the org.apache.sling.commons.log bundle, which is one of the first bundles installed and started by the Sling Launcher."

bryantcj52 commented 3 years ago

@eyalkoren

https://gist.github.com/bryantcj52/dd805eaf40e5c713675292d4aecb9077

It looks like it can't find any of the MDC classes. Both org.slf4j and org.apache.log4j.MDC should be available.

I'm guessing the java agent is loaded before OSGi sets everything up. Our current bootdelegation: org.osgi.framework.bootdelegation=co.elastic.apm.*, ${org.apache.sling.launcher.bootdelegation}

eyalkoren commented 3 years ago

I'm guessing the java agent is loaded before OSGi sets everything up. Our current bootdelegation: org.osgi.framework.bootdelegation=co.elastic.apm.*, ${org.apache.sling.launcher.bootdelegation}

This does not affect the timing of loading, but where our agent classes are loaded from, and it may indeed affect class visibility. This configuration is not required since agent version 1.18.0, so first thing is to remove it (and upgrade if you are using an older agent version 🙂 ).

If this doesn't resolve the issue, it may be a matter of timing, in which case try to set this system property: -Delastic.apm.delay_agent_premain_ms=5000

bryantcj52 commented 3 years ago

Removing the boot delegation and adding the delay (went up to 15 seconds) didn't help.

bryantcj52 commented 3 years ago

We are using the -javaagent flag for setup, should we switch to the manual Attach API in a bundle activator class? I would assume we would lose any tracking of errors before the bundle starts up?

eyalkoren commented 3 years ago

Please try this analysis snapshot with the following configuration:

-Delastic.apm.log_level=debug
-Delastic.apm.log_file=agent/agent.log
-Delastic.apm.log_format_file=json

It will format agent log as ECS JSON, including MDC contents. In addition, I added the class loader that we use for MDC lookup to the logs. In the created logs, look for log lines that contain `startTransaction " and paste them here.

In addition, if you can create a dev env, set a breakpoint and debug MdcActivationListener#before(), you may be able to see what is not working as expected.

bryantcj52 commented 3 years ago

Sorry for the delay, that snapshot doesn't seem to be available anymore?

SylvainJuge commented 3 years ago

Sorry, the related PR was closed due to lack of feedback, and thus the snapshot is not available anymore.

If you are comfortable re-building the agent locally, you can just re-apply the simple change that was in this PR : https://github.com/elastic/apm-agent-java/pull/1797/files and then apply the configuration recomended by Eyal above.

SylvainJuge commented 3 years ago

There hasn't had any updates in a while, closing issue for now.

@bryantcj52 does the issue is still present in your case with the latest agent version ? If so, feel free to add a comment here and we'll re-open and further investigate the issue.