DataDog / dd-trace-java

Datadog APM client for Java
https://docs.datadoghq.com/tracing/languages/java
Apache License 2.0
588 stars 290 forks source link

Agent log output in JSON? #1109

Open dpratt opened 4 years ago

dpratt commented 4 years ago

We run all our services in docker containers in a k8s cluster, as I suspect quite a few Datadog users do, and as such we have our standard boilerplate set to emit all logs from all sources (channeled through slf4j) as properly formatted single line JSON. This allows for the Datadog agent to pick up and properly index logs and deal with things like stack traces in a sane fashion.

For the most part, this works quite well, except for log messages printed to standard out by the datadog java agent. Since the slf4j infrastructure at that level is loaded and configured in a different classloader, we don't really have any way to influence it's output - it appears to just be using the stock simple slf4j logger implementation, which only really allows for coarse grained configuration of log levels.

Is there a mechanism by which we could have the java agent use logback or some other slf4j binding that outputs JSON instead? We have people getting woken up at all hours of the night due to errors generating alerts because the DD log infrastructure interprets anything that can't be parsed as JSON at the ERROR level. We've mitigated a bunch of this by adding -Ddatadog.slf4j.simpleLogger.defaultLogLevel=WARN to our JVM launch parameters, but there's still a bunch of stuff that occasionally gets through.

jeremy-lq commented 4 years ago

Apologies for the delay @dpratt. Can you open a support ticket so we can properly track this feature request? Thanks.

svetozar02 commented 4 years ago

I'm interested in this as well. May I point out that DD docs recommend logging in logstash json format but the same can't be accomplished easily with DD agent...

matt-canty-dragon commented 4 years ago

Is there upside that you might be able to share @dpratt ? I have same issue.

raj-saxena commented 4 years ago

@jeremy-lq has this been implemented? Is there another ticket/issue where this is being tracked?

philippneugebauer commented 3 years ago

@svetozar02 cannot agree more with your statement. DD does a statement in the docs and completely contradicts it with its software

asharma3007 commented 3 years ago

Supporting development of this feature (which I see as a bug) - for the reasons others have described. Our monitoring team has to spend time looking at every alert, and these are fruitless ones. We are increasing the risk of alert fatigue as well.

robinglen commented 2 years ago

This has become a huge issue for us enabling Datadog application security. Enabling it means you get potential logs like:

WARN com.datadog.appsec.gateway.AppSecRequestContext - xxx

You can clearly see in the log its a warning, in this case about a potential security issue but I quote from support case "It looks like these logs may be going to stderr, is that right? If so, we assign the ERROR status to all stderr logs by default (noted in the Important Notes here: https://docs.datadoghq.com/agent/docker/log/?tab=hostagent#installation )."

This means our applications are now flooded with error logs that we can't really do anything with.

ValentinZakharov commented 2 years ago

@thearegee I've seen support case, you are talking about. I'm sorry for that bad experience. There was an AppSec issue which cause spams WARN message for every requests. But now it's fixed and will be available in coming release (today). In addition, we are going to rework logging system, to make it more neat in future releases.

robinglen commented 2 years ago

@ValentinZakharov thank you thats great news and thanks for getting this in so quickly. We will upgrade this today and see how it looks. If we can help with logging please let me know

franciscosantana commented 2 years ago

Hi @ValentinZakharov ! Any update about the rework on logging system?

jon-p commented 2 years ago

Any update about the rework on logging system?

Any movement on this? 3 years later and we're still having to configure datadog agent to log to System.out so as not to spam fake errors to the datadog log... which is kinda funny/annoying.

ValentinZakharov commented 2 years ago

I'm sorry for the delay. Reworking of current logging system been postponed due critical impact on existing customers. However, we are trying to fix customer's issues with minor fixes. @jon-p If I understood the problem correctly, because of Java tracer sends messages into seterr, they are detected as an errors in DataDog dashboard, even if they are just messages. If I'm correct, then you can try to configure Java tracer logs to be written into dedicated log file with the option:

-Ddatadog.slf4j.simpleLogger.logFile=<NEW_LOG_FILE_PATH>

Ref: documentation In this case, I suppose, Agent should ignore messages from the Java tracer.

Please, let me know, if that way, can't solve the problem.

dpratt commented 2 years ago

The solution above doesn't really solve the problem, in that if you redirect the log messages to a file, they'll likely just be ignored since they're not going to stderr/stdout. Additionally, if you're running in a container, this is likely an anti-pattern since you'll be writing to an ephemeral file that will eventually just be discarded when your container exits. The real solution here would be to do one of two things -

ValentinZakharov commented 2 years ago

@dpratt thanks for sharing details of proposed solution. I've discussed details of the feature request with colleagues. Unfortunately, due to limited resources, new features are implemented in accordance with the priorities set by the product management team. Mostly priorities are depend on the number and size of customers. At the moment, logger improvements is placed in the backlog and has a low priority, so I cannot provide any time estimation for this feature.

bkmjaris commented 2 years ago

Just in case it helps anyone else -- Our office has also filed a feature request ticket with DD Support (892588) providing a link to this GitHub issue. No new timing or announcements are available, but we've added our voice in support of a solution.

mrzor commented 2 years ago

I'm adding our account to the list of affected customers who would like to see a resolution.

Given how hard it is to build dd-trace-java on our side, and the fact that the agent's classloader is fully locked, short of monkeypatching classfiles we have no way of getting properly structured logs out of the APM agent.

der-eismann commented 1 year ago

We also stumbled about this in our trial phase. DataDog recommends JSON logging in its own docs, but then the agent logs get marked as errors in the DataDog web ui.

briceburg commented 1 year ago

Does setting the DD_LOG_FORMAT_JSON environment variable or applying the log_format_json datadog.yml configuration output the desired structured logs? These would be agent logs only (e.g. https://hub.docker.com/r/datadog/agent) -- it's up to individual services/applications to format their own logs. Afaik there's no "java" only agent that the OP is refering to?

https://github.com/DataDog/datadog-agent/blob/a3d970f0eb8359f7a4676768a91f0165e8203887/pkg/config/config_template.yaml#L2050-L2054

greg-p-slingshot commented 1 year ago

@briceburg, I assume the above posters are talking about the dd-java-agent.jar that the documentation instructs you to download and run with your java app: https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/java/#add-the-java-tracer-to-the-jvm Calling it the "java agent" seem reasonable to me, although might not be the correct term. And yes, I've tried setting that the DD_LOG_FORMAT_JSON=true environment variable, the yml config, and a few other things, all to no avail.

briceburg commented 1 year ago

OK yes, I struggle with this too across the various ddtrace implementions.

In an effort to "have all logs in JSON", I reduced the output of ddtrace to "WARNING" by setting the DD_TRACE_LOG_FILE_LEVEL=WARNING environment variable wherever it gets included (e.g. the dd-java-agent.jar [why that is not called dd-java-trace.jar is another question ;)]), and then datadog seems to do a good job in its log processors to properly tag the message; but yeah, super annoying its not in JSON.

jakubkalicki-tink commented 1 year ago

When can we expect this problem to be fixed? 😞

I have tried DD_TRACE_LOG_FILE_LEVEL=WARNING mentioned by @briceburg, but I still see info logs

gabrielhof commented 1 year ago

I was able to configure the java agent log messages to go to stdout and also get the messages correctly parsed by Datadog.

TL;DR

If you just want Datadog to stop identifying the agent logs messages as errors, add the following when running your application via java -jar:

-Ddatadog.slf4j.simpleLogger.logFile=System.out

If you also want these logs to get parsed by datadog, add the following:

-Ddatadog.slf4j.simpleLogger.dateTimeFormat="yyyy-MM-dd HH:mm:ss.SSS"

Explanation:

About logs going to stderr:

The default value for datadog.slf4j.simpleLogger.logFile is System.err, which causes logs go to stderr:

Changing this property to use System.out fixes it.

About getting Datadog to parse the logs

The default for datadog.slf4j.simpleLogger.dateTimeFormat is '[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']':

The default date format causes the agent log messages not to be accepted by any of Datadog's default formats for parsing Java logs:

image

image

Simply changing the log date format to yyyy-MM-dd HH:mm:ss.SSS is enough for one of the default formats to parse the message.

image

ls-rein-martha commented 1 year ago

Another workaround (only for user that uses DataDog LogExplorer feature), is to create a new Grok processor in your DataDog log pipeline. You also can add new rules to existing default Java processors.

I added the same as the default Java and update the date pattern

Rule:

irregular_dd_format \[dd\.trace %{_irregular_date}\]\s+\[%{_thread_name}\]\s+%{_status}\s+%{_logger_name}\s*(%{_context}\s*)?- (?>%{word:dd.trace_id} %{word:dd.span_id} - )?%{data:message}((\n|\t)%{data:error.stack})?

Helper rule:

_irregular_date %{date("yyyy-MM-dd HH:mm:ss:SSS Z"):timestamp}
_duration %{integer:duration}
_thread_name %{notSpace:logger.thread_name}
_status %{word:level}
_logger_name %{notSpace:logger.name}
_context %{notSpace:logger.context}
_line %{integer:line}
SJrX commented 1 year ago

Thank you @gabrielhof , I think that the system property should not have quotes, or if it should, then you should quote the whole thing. At least for me in a Dockerfile the quotes just cause quotes to appear around the timestamp.

i.e., I think changing this:

-Ddatadog.slf4j.simpleLogger.dateTimeFormat="yyyy-MM-dd HH:mm:ss.SSS"

to

"-Ddatadog.slf4j.simpleLogger.dateTimeFormat=yyyy-MM-dd HH:mm:ss.SSS"

nasilrgb commented 4 months ago

I got this error from javaagent, why it is showing as error ?

Screenshot 2024-07-17 at 10 05 27 PM
sausti commented 4 days ago

I got this error from javaagent, why it is showing as error ?

Screenshot 2024-07-17 at 10 05 27 PM

@nasilrgb Were you able to prevent that log from appearing at the error level? Updating our Dockerfile with the suggestions above, i.e.:

ENTRYPOINT ["java", "-Ddatadog.slf4j.simpleLogger.logFile=System.out", "-Ddatadog.slf4j.simpleLogger.dateTimeFormat=yyyy-MM-dd HH:mm:ss.SSS", "-jar", "/application.jar"]

We can ensure all other Datadog agent logs have the correct log level. This, for some reason, still logs at the error level.