DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
306 stars 374 forks source link

Partial SQL query being reported in DD APM #1878

Open jpaulgs opened 2 years ago

jpaulgs commented 2 years ago

I'm looking at my DD APM traces and I can see the following be query being reported in the DD APM UI

image

I expected the highlighted query to be reported like INSERT INTO dispatches ( event_commit_id, updated_at, id, data, version ) VALUES ( ? ) ON CONFLICT ( id, version ) DO UPDATE SET event_commit_id = ? updated_at = ? id = ? data = ? version = ?. I know that these inserts are using ON CONFLICT. This is the query being executed. I verified that by turning on debug logging.

Note this insert is fairly large (approximately 200kB) as I'm inserting many records.

Is this expected behaviour? I couldn't find details confirming it in documentation or the changelog...

If it is expected behaviour could the sql spans be annotated with something to indicate that the query is truncated?

delner commented 2 years ago

There could be a few different explanations for this.

  1. Truncation is definitely a possibility; either on the trace library side, or the Datadog agent side.
  2. SQL quantization in the Datadog agent may be lopping this off, in an attempt to create tighter groupings of SQL queries.

If we can verify truncation is not happening on the trace library side, then we need to look at the agent next.

jpaulgs commented 2 years ago

I'm happy help narrow this down!

I've tried setting DD_TRACE_DEBUG=true with my application logger set to debug. That hasn't provided any additional information.

Any suggestions in regards to how I might identify if it is the DD agent or the dd-trace-rb truncating?

ZimbiX commented 2 years ago

I work with @jpaulgs. Our query is not truncated in the dd-trace-rb span at least - it is inserting hundreds of records, and the one I looked at is 193,093 characters long.

I found this via a monkeypatch similar to one we use in our ddtrace specs:

class Datadog::Span
  alias finish_orig finish
  def finish(*args)
    finish_orig(*args).tap do |span|
      $spans ||= []
      $spans << span.to_hash
    end
  end
end

and right after our query is executed:

puts $spans.last.fetch(:resource)
ZimbiX commented 2 years ago

My agent logs include:

dex-core-dd-agent-1  | 2022-02-04 07:59:36 UTC | TRACE | INFO | (pkg/trace/info/stats.go:111 in LogStats) | [lang:ruby lang_version:2.7.5 interpreter:ruby-x86_64-linux tracer_version:0.54.1 endpoint_version:v0.4] -> traces received: 1, traces filtered: 0, traces amount: 203500 bytes, events extracted: 0, events sampled: 0

Sending 203,500 bytes sounds like a good sign that it's not truncating.

Instead doing a select 1 reveals the size of the overhead as a bit under 10,160 bytes:

dex-core-dd-agent-1  | 2022-02-04 08:02:56 UTC | TRACE | INFO | (pkg/trace/info/stats.go:111 in LogStats) | [lang:ruby lang_version:2.7.5 interpreter:ruby-x86_64-linux tracer_version:0.54.1 endpoint_version:v0.4] -> traces received: 1, traces filtered: 0, traces amount: 10160 bytes, events extracted: 0, events sampled: 0

So if it's a truncation issue, that would indicate that it's a server-side Datadog issue to me.

delner commented 2 years ago

So sounds like nothing is happening on the Ruby side which means it's likely the agent. There's not much I can do in this particular library/repo, but we should continue this investigation on the trace agent side.

Can you open an issue and link back to this one? Alternatively if you prefer to go through the usual Datadog support channels, feel free to name drop me and link to this issue on that ticket to expedite the process, too.