SemanticLogger integration not correlating logs with server/environment

Drowze commented 2 years ago

Current behaviour SemanticLogger integration is not correlating logs with server/environment (configured via DD_SERVICE/DD_ENV)

Expected behaviour Given the quote below, I'd expect rails_semantic_logger to just work with Datadog and correlate the logs with their traces and correct env/service (i.e. the ones supplied by the DD_ENV and DD_SERVICE environment variables)

For Rails applications using the default logger (ActiveSupport::TaggedLogging), lograge, or semantic_logger, trace correlation injection is automatically configured. _from: https://docs.datadoghq.com/tracing/other_telemetry/connect_logs_and_traces/ruby/_

Steps to reproduce

Started a Datadog docker agent using docker-compose:

dd-agent:
 image: gcr.io/datadoghq/agent:7
 pid: host
 environment:
   - DD_ENV=rafael-testing
   - DD_API_KEY=${DD_API_KEY}
   - DD_LOGS_ENABLED=true
 ports:
  - 8126:8126
  - 10518:10518
 volumes:
   - /var/run/docker.sock:/var/run/docker.sock:ro
   - /var/lib/docker/containers:/var/lib/docker/containers:ro
   - ${PWD}/dd-agent/logs.yaml:/conf.d/logs.yaml

My mounted logs configuration:

logs:
 - type: tcp
   port: 10518
   service: log-echo
   source: ruby

On a Rails server (on the host machine, not a container) I added the gems rails_semantic_logger and net_tcp_client

On datadog configuration I added:

c.tracing.instrument :rails
c.tracing.log_injection = true

On config/development.rb I got:

config.log_level = ENV.fetch("LOG_LEVEL", "info").downcase.strip.to_sym
config.rails_semantic_logger.add_file_appender = false
config.semantic_logger.add_appender(appender: :tcp, server: "localhost:10518")

On a Rails route, I just added a simple Rails.logger.info("Hello world")

Finally, I started the Rails server with:

DD_SERVICE=auth-user DD_VERSION=after-sl DD_ENV=rafael-testing2 bundle exec rails server

Result:

Environment

ddtrace version: 1.5.1
Ruby version:
Operating system: 3.1.2
Relevant library versions: net_tcp_client (2.2.0), rails_semantic_logger (4.10.0), semantic_logger (4.12.0)

Drowze commented 2 years ago

Some further testing, this time NOT using any Ruby process. With only a datadog agent with the following logs config:

logs:
  - type: tcp
    port: 10518
    service: log-echo
    source: ruby

A JSON log with dd on the root object.

echo '{
  "dd": {
    "trace_id": "754516775358317257",
    "env": "rafael-testing",
    "service": "auth-user",
    "version":42
  },
  "ddsource": "ruby-overwrite",
  "message": "HI DAWG"
}' | jq -c | nc localhost 10518

trace reported ✅ environment reported ✅ service reported (overwritten) ✅ version reported ✅ source: ignored (not overwritten) :x:

A JSON log with dd inside a named_tags object (this is how SemanticLogger does it by default)

echo '{
  "named_tags": {
    "dd": {
      "trace_id": "754516775358317257",
      "env": "rafael-testing",
      "service": "auth-user",
      "version":42
    },
    "ddsource": "ruby-overwrite"
  },
  "message": "HI DAWG"
}' | jq -c | nc localhost 10518

trace reported ✅ environment ignored :x: service ignored(not overwritten) :x: version ignored :x: source ignored (not overwritten) :x:

A JSON log with dd inside an object with a name different than named_tags

echo '{
  "weirdly_named": {
    "dd": {
      "trace_id": "754516775358317257",
      "env": "rafael-testing",
      "service": "auth-user",
      "version":42
    },
    "ddsource": "ruby-overwrite"
  },
  "message": "HI DAWG"
}' | jq -c | nc localhost 10518

trace ignored :x: environment ignored :x: service ignored(not overwritten) :x: version ignored :x: source ignored (not overwritten) :x:

A formatted plain string
```
echo '[dd.env=rafael-testing dd.service=auth-user dd.version=42 dd.source=ruby-overwrite dd.trace_id=754516775358317257] HI DAWG' | nc localhost 10518
```
trace reported ✅ environment reported ✅ service reported (overwritten) ✅ version reported ✅ source: ignored (not overwritten) :x:

Seeing the above results, some points:

Is ddsource/dd.source really necessary on the log string? At least on my setup it felt a bit useless.
trace_id seems to be the only correctly detected attribute inside a named_tags object. Can Datadog correctly detect other attributes as well? It seems to be a bug that it doesn't.

TonyCTHsu commented 2 years ago

👋 @Drowze Thanks for reporting, I will take a look at this issue.

TonyCTHsu commented 1 year ago

👋 @Drowze , I want to get back to you about this issue.

I am still communicating with our internal backend team to figure out the culprit that is causing the correlation issue with semantic logger.

So far, with semantic logger format, only correlation for trace_id and span_id in the named_tags are implemented. service and source from agent configuration also seems to be implemented.

I will keep you posted with my research and make changes for tracer if necessary. 😉

TonyCTHsu commented 1 year ago

@Drowze Sorry I have not made too much progress on this.

Could you file for a ticket about this issue? This would go through layers of triage from our support engineers(and to track this), would eventually find the owner of this.

https://help.datadoghq.com/hc/en-us/requests/new

You could simply referred to this Github issue, and tagged me for suggesting doing so.

silasb commented 6 months ago

Any updates around this?

tylerwillingham commented 1 month ago

We're noticing an issue in our setup that may be similar but possibly distinct.

We're in a containerized environment and the SemanticLogger.host attribute is picking up our pod name and propagating that up as the host value in our logs. (i.e. a JSON log starts with { host: 'pod-name-example' ....

This has caused us correlation headaches where we now have significant deltas in how logs from a single pod are reported because the majority of logs are application logs via rails_semantic_logger including the invalid hostname.

It looks like datadog stream-logs has the proper host information associated at the agent-level, but the log's value is overriding it. To get around this, I need a custom formatter that looks like:

module SemanticLogger
  module Formatters
    class HostlessJson < SemanticLogger::Formatters::Raw
      def initialize(time_format: :iso_8601, time_key: :timestamp, **args)
        super(time_format: time_format, time_key: time_key, **args)
      end

      # NOTE: `#call` is where all of the changes live. This is basically a carbon copy of SemanticLogger::Formatters::Json
      #  formatter but with the `h.delete(:host)` call.
      def call(log, logger)
        log_hash = super(log, logger).tap do |h|
          h.delete :host
        end.to_json
      end
    end
  end
end

I think this could equally be seen as a Datadog or a SemanticLogger issue since SemanticLogger.host is implemented in such a way that it's always non-nil, but the Datadog instrumentation is implemented via a patch, rather than a custom appender but the impact this has on us is greater on the Datadog side, as a customer

Edit: It turns out SemanticLogger has an API for this behavior if you specify a custom formatter.

# config/environments/production.rb
log_formatter = SemanticLogger::Formatters::Json.new(
  log_host: false # ✨✨✨
)
config.semantic_logger.add_appender(io: STDOUT, formatter: log_formatter)

januszm commented 2 days ago

I see that this issue was opened 2 years ago, does it mean semantic_logger integration is still broken or will we see trace_id and span_id in logs and proper correlation with default semantic_logger setup and dd c.tracing.log_injection = true ?

marcotc commented 1 day ago

👋 Sorry for the delay here. I tried a very simple local app outputting SemanticLogger log lines, and I can't get the any sort of rich parsing from the Logs product:

# I get "No attributes have been extracted from the log message." for
SemanticLogger.add_appender(io: log_output) 

# I don't get the log line at all for
SemanticLogger.add_appender(io: log_output, formatter: :json)

# For reference, my structured message looks like this in memory:
{"host":"my-host","application":"Semantic Logger","timestamp":"2024-11-08T19:43:49.177537Z","level":"info","level_index":2,"pid":1234,"thread":"132","named_tags":{"dd":{"env":"production","service":"MyService","version":"1.2.3","trace_id":"1039570011634864780","span_id":"1469129669426488971"},"ddsource":"ruby"},"name":"TestClass","message":"My Message!"}

I'll try to setup a basic Rails app from scratch, then add SemanticLogger, to ensure that I'm using a realistic setup, instead of manually using the SemanticLogger API which is not a common application configuration.

tylerwillingham commented 1 day ago

I use SemanticLogger at work with datadog 2.x (we're running on 2.5 in prod currently).

rails_semantic_logger + datadog gems work as expected.

In config/environments/production.rb there are a couple of additional lines to log to STDOUT via JSON and you can use config.rails_semantic_logger.format = :json in config/environments/development.rbto getlogs/development.log` printing in JSON under RAILS_ENV=development.

Trace correlation works out of the box for us with the current SemanticLogger integration BUT I wanted to post that bit about host correlation because it was a gotcha for us which caused DD logs to not correlate to their host infra properly until I dropped the host value from the payload.

I was able to refactor it to be more simple by using:

config.semantic_logger.add_appender(
  io: STDOUT,
  formatter: SemanticLogger::Formatters::Json.new(log_host: false)
)

DataDog / dd-trace-rb

SemanticLogger integration not correlating logs with server/environment #2356