DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
310 stars 375 forks source link

Duplicate mysql2 APM spans when connecting APM to Database Monitoring with Rails #3225

Open bessey opened 1 year ago

bessey commented 1 year ago

Current behaviour

If I have a Rails application that wants to make use of connecting Database Monitoring with APM AFAICS this is the minimum needed ddtrace configuration:

  c.tracing.instrument :mysql2, comment_propagation: "full"
  c.tracing.instrument :rails

And this works, but it leads to seemingly duplicate nested mysql2 spans:

image

The spans differ slightly:

Screenshot 2023-10-25 at 16 10 44 Screenshot 2023-10-25 at 16 10 50

I use the count of mysql2 spans as a proxy for query count, so this doubling up is rather annoying. Removing c.tracing.instrument :mysql2 fixes it, but loses comment propagation.

Expected behaviour

I expect there to be only one span here, or for the purpose of each span to be clearly different if they do represent different things.

How does ddtrace help you?

Its the best thing since sliced 🍞!

Environment

delner commented 1 year ago

Hey @bessey! Thanks for the report! We'll have someone take a look at this very soon, and get sense of what's going on. They may need to ask some more questions about how they can replicate that. We'll keep you posted.

marcotc commented 1 year ago

:wave: @bessey, to use "Database Monitoring", you have to use either the PG or mysql integration, which your application correctly has.

The duplicate span is coming from ActiveRecord. You can hide the ActiveRecord spans by disabling the integration:

c.tracing.instrument :active_record, enabled: false

Let us know if this makes your trace graph look the way you expect.

patrobinson commented 1 year ago

This should be the default, creating duplicate spans has cost implications. We're paying thousands of dollars a month in span ingestion volume costs because when we upgraded dd-trace-rb it enabled sending these duplicate spans

kigster commented 1 year ago

I've looked at these flame graphs and often times the pg span is a bit shorter than the outer ActiveRecord span. I explained it to myself as the difference between:

  1. Executing a raw query using a pg gem and getting the data back as a ruby hash, and
  2. The above + transforming the ruby hash into an array of ActiveRecord objects.

If the difference between the two spans is significant then something is wrong with the definition of the AR model. For example, it might do an additional lookup inside a constructor.

Another dimension to this would be even more valuable, but I don't know if that's possible.

If you could see the size of the process, or number of objects, over time, overlayed into the flame graph, you could instantly see how much conversion into AR models is costing you in terms of Ram.

my 2c.

farnoy commented 5 months ago

I am using pg instead of mysql2, but I have an identical issue.

What the active_record instrumentation has that makes it valuable is describes: :replica, which the DB connector instrumentations pg/mysql2 don't seem to have. Plus, when you have two spans for each DB query, the DBM to APM integration gets confused and only provides links to DBM from the outer span service:active_record and not the inner leaf span service:pg.

Please advise on how to resolve this cleanly. The recommendation given earlier for disabling active_record could work if the DB connector instrumentations were able to handle multiple database connections like AR's instrumentation does.

EDIT: Double checked and from an APM Trace view, I only get a link to DBM on the leaf service:pg span, while on the APM Servie Catalog View, I only see databases and explain plans on the parent service:active_record.

patrobinson commented 5 months ago

@farnoy we have a similar setup, Rails connecting to multiple postgres databases

We disable pg like so:

  c.tracing.instrument :pg, enabled: false

and add one active record service per database:

  dbs = Rails.configuration.database_configuration[Rails.env].keys.map do |db|
    # Some internal logic goes here to give each database a human friendly name
    [db.to_sym, db.dasherize.prepend("db-")]
  end
  dbs.to_h.each do |db, name|
    c.tracing.instrument :active_record, describes: db, service_name: name
  end

Note we had to get a bug fixed to actually disable pg, you need to be running at least 1.18.0 for it to work.

farnoy commented 5 months ago

Thanks @patrobinson , I have something similar for :active_record, but by disabling the :pg instrumentation, I lose comment propagation. Without comment propagation, I wouldn't have the APM to DBM connection. I need to keep that.

Drowze commented 2 months ago

We're also stumbling on this problem...

Our current setup has active_record instrumentation for two different roles (primary, replica). Now we'd also like to add tags to our SQL queries but doing so I end up stumbling on this issue.

Would be nice if it was possible to somehow get the SQL tags while only relying on the active_record instrumentation.

Drowze commented 2 months ago

After debugging a bit, here's what we came with to use only activerecord instrumentation (i.e.: without pg/mysql instrumentation) and still have our APM traces correlated with DBM queries. We didn't test this for long yet, but seems like a good enough solution for avoiding spans that are closely related in most of the time (which could cause confusion for our developer and perhaps an increase in our bill).

# in config/environments/production.rb
config.active_record.query_log_tags_enabled = true
config.active_record.query_log_tags = [
  # See: https://github.com/DataDog/dd-trace-rb/issues/3225
  {
    # https://github.com/DataDog/dd-trace-rb/blob/v2.3.0/lib/datadog/tracing/contrib/propagation/sql_comment.rb#L48
    traceparent: ->(_context) {
      if (trace_op = Datadog::Tracing.active_trace)
        Datadog::Tracing::Distributed::TraceContext
          .new(fetcher: nil)
          .send(:build_traceparent, trace_op.to_digest)
      end
    }
  }
]

Or, going even further and trying to replicate the sql :pg instrumentation behaviour: (just make sure to run this after configuring Datadog, e.g.: in the Datadog initializer, after the configure block)

# Extracted from:
# https://github.com/DataDog/dd-trace-rb/blob/v2.3.0/lib/datadog/tracing/contrib/propagation/sql_comment.rb#L48
Rails.configuration.active_record.query_log_tags_enabled = true
Rails.configuration.active_record.query_log_tags = [{
  Datadog::Tracing::Contrib::Propagation::SqlComment::Ext::KEY_ENVIRONMENT => Datadog.configuration.env,
  Datadog::Tracing::Contrib::Propagation::SqlComment::Ext::KEY_PARENT_SERVICE => Datadog.configuration.service,
  Datadog::Tracing::Contrib::Propagation::SqlComment::Ext::KEY_VERSION => Datadog.configuration.version,
  Datadog::Tracing::Contrib::Propagation::SqlComment::Ext::KEY_HOSTNAME => ->(_ctx) {
    if (span_op = Datadog::Tracing.active_span)
      span_op.get_tag(Datadog::Tracing::Metadata::Ext::TAG_PEER_HOSTNAME)
    end
  },
  Datadog::Tracing::Contrib::Propagation::SqlComment::Ext::KEY_DB_NAME => ->(_ctx) {
    if (span_op = Datadog::Tracing.active_span)
      span_op.get_tag(Datadog::Tracing::Contrib::Ext::DB::TAG_INSTANCE)
    end
  },
  Datadog::Tracing::Contrib::Propagation::SqlComment::Ext::KEY_PEER_SERVICE => ->(_ctx) {
    if (span_op = Datadog::Tracing.active_span)
      span_op.get_tag(Datadog::Tracing::Metadata::Ext::TAG_PEER_SERVICE)
    end
  },
  Datadog::Tracing::Contrib::Propagation::SqlComment::Ext::KEY_TRACEPARENT => ->(_ctx) {
    if (trace_op = Datadog::Tracing.active_trace)
      Datadog::Tracing::Distributed::TraceContext
        .new(fetcher: nil)
        .send(:build_traceparent, trace_op.to_digest)
    end
  }
}.transform_keys(&:to_sym)]
tylerwillingham commented 1 month ago

We have a similar issue with over-submitting spans in our Rails + PG app.

I saw the recommendation to disable the active_record instrument... for the multi-db reasons listed above I don't want to disable that instrument. Not to mention that spans with @component:active_record are responsible for the operation_name:active_record.instantiation spans which can be valuable to see.

In many cases, the active_record instrument produces spans that are more useful for debugging, but DBM configuration needs to go through the pg instrument for us and that's tying our hands in reducing our duplicate spans. It pollutes our SQL Queries tab (double instrumenting most queries) and makes for a regular scenario where I just have to tell our engineers "Sorry, this just looks weird because Datadog is duplicating X and Y" to get us what we need.