Closed bryantcj52 closed 3 years ago
Thanks for reporting!
The automatic MDC keys doesn't happen early enough for them to show up in the logs when using Apache Sling
I am not sure how this would happen. Once we activate a transaction, the IDs are synchronously added to the MDC before the request is being handled. So, theoretically, when your filter is applied, it is already after the agent did just the same (put IDs in MDC) at the beginning of the FilterChain#doFilter()
invocation as part of the transaction activation.
Can you debug MdcActivationListener#before()
to see what's not working as expected?
I am assuming Sling already uses slf4j by default. If not, and your logging is done directly through log4j2, note that version 1.23.0 contains a fix to a bug that prevented log correlation when using log4j2 directly.
Yes, Sling uses slf4j with logback (https://sling.apache.org/documentation/development/logging.html). I'm on the latest APM agent 1.23.0.
I set up a debug logger for co.elastic.apm.agent.mdc and even co.elastic.apm.agent but nothing is getting logged for some reason. Everything is loaded via osgi so the javaagent may be hooking in before the logging configuration? I tried adding -Dorg.slf4j.simpleLogger.defaultLogLevel=debug
to the JVM options but nothing.
Everything is loaded via osgi so the javaagent may be hooking in before the logging configuration?
Good direction! OSGi may be related, not so much in terms of timing, but more so in terms of visibility.
Can you set log_level = debug
and share the log (from startup) through https://gist.github.com/?
EDIT: this may be related to timing indeed - if we cache our MDC MethodHandle
before the logging OSGi bundle is initialized. However, the Apache Sling documentation says: "Logging in Sling is supported by the org.apache.sling.commons.log bundle, which is one of the first bundles installed and started by the Sling Launcher."
@eyalkoren
https://gist.github.com/bryantcj52/dd805eaf40e5c713675292d4aecb9077
It looks like it can't find any of the MDC classes. Both org.slf4j and org.apache.log4j.MDC should be available.
I'm guessing the java agent is loaded before OSGi sets everything up. Our current bootdelegation: org.osgi.framework.bootdelegation=co.elastic.apm.*, ${org.apache.sling.launcher.bootdelegation}
I'm guessing the java agent is loaded before OSGi sets everything up. Our current bootdelegation: org.osgi.framework.bootdelegation=co.elastic.apm.*, ${org.apache.sling.launcher.bootdelegation}
This does not affect the timing of loading, but where our agent classes are loaded from, and it may indeed affect class visibility. This configuration is not required since agent version 1.18.0, so first thing is to remove it (and upgrade if you are using an older agent version 🙂 ).
If this doesn't resolve the issue, it may be a matter of timing, in which case try to set this system property: -Delastic.apm.delay_agent_premain_ms=5000
Removing the boot delegation and adding the delay (went up to 15 seconds) didn't help.
We are using the -javaagent flag for setup, should we switch to the manual Attach API in a bundle activator class? I would assume we would lose any tracking of errors before the bundle starts up?
Please try this analysis snapshot with the following configuration:
-Delastic.apm.log_level=debug
-Delastic.apm.log_file=agent/agent.log
-Delastic.apm.log_format_file=json
It will format agent log as ECS JSON, including MDC contents. In addition, I added the class loader that we use for MDC lookup to the logs. In the created logs, look for log lines that contain `startTransaction " and paste them here.
In addition, if you can create a dev env, set a breakpoint and debug MdcActivationListener#before()
, you may be able to see what is not working as expected.
Sorry for the delay, that snapshot doesn't seem to be available anymore?
Sorry, the related PR was closed due to lack of feedback, and thus the snapshot is not available anymore.
If you are comfortable re-building the agent locally, you can just re-apply the simple change that was in this PR : https://github.com/elastic/apm-agent-java/pull/1797/files and then apply the configuration recomended by Eyal above.
There hasn't had any updates in a while, closing issue for now.
@bryantcj52 does the issue is still present in your case with the latest agent version ? If so, feel free to add a comment here and we'll re-open and further investigate the issue.
The automatic MDC keys doesn't happen early enough for them to show up in the logs when using Apache Sling. Error.id was the only one showing up, although not every time for some reason. I had to create a custom Sling Request Filter and manually add the APM trace/transaction/span IDs to MDC.
I'm not sure if it's possible to add this here if it detects the app is running in sling.
https://gist.github.com/bryantcj52/8e5cce3d1a98d3da89b089bcf230ba61