DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
311 stars 375 forks source link

`IOError` during tracing when running `PG::Connection#exec` in a thread #3891

Open tdeo opened 2 months ago

tdeo commented 2 months ago

Current behaviour

Hello,

We're getting the following IOError when using the parallel gem in the following way:

IOError: stream closed in another thread (IOError)
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:27:in `exec'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:27:in `block in exec'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:142:in `block in trace'
  from ddtrace (1.23.3) lib/datadog/tracing/trace_operation.rb:198:in `block in measure'
  from ddtrace (1.23.3) lib/datadog/tracing/span_operation.rb:150:in `measure'
  from ddtrace (1.23.3) lib/datadog/tracing/trace_operation.rb:198:in `measure'
  from ddtrace (1.23.3) lib/datadog/tracing/tracer.rb:385:in `start_span'
  from ddtrace (1.23.3) lib/datadog/tracing/tracer.rb:159:in `block in trace'
  from ddtrace (1.23.3) lib/datadog/tracing/context.rb:45:in `activate!'
  from ddtrace (1.23.3) lib/datadog/tracing/tracer.rb:158:in `trace'
  from ddtrace (1.23.3) lib/datadog/tracing.rb:18:in `trace'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:105:in `trace'
  from ddtrace (1.23.3) lib/datadog/tracing/contrib/pg/instrumentation.rb:26:in `exec'
  from packs/devtools/db/app/services/devtools/fork_database_helper.rb:217:in `block in analyze'

My usage:

# packs/devtools/db/app/services/devtools/fork_database_helper.rb

def analyze(tables_to_analyze)
  Parallel.each(tables_to_analyze, in_threads: 4) do |table|
    conn = PG::Connection.new(connection_string)
    conn.exec("ANALYZE #{table}") # This is line 217
  ensure
    conn.finish if conn && !conn.finished?
  end
end

Expected behaviour

No error is thrown

Steps to reproduce

Running the code above

Environment

ivoanjo commented 1 month ago

Hey @tdeo! Thanks for the report, and for the patience with our slow answer >_>

Yesterday I set aside some time to try to reproduce this and... I wasn't successful.

It sounds like you may be able to trigger this this on your side. If you're still up to helping us debug and fix this (if not -- that's ok! We did take a bunch of time to get back to you), can I ask you to try:

While we don't know of issues with either of those two currently, I think those would be the most likely culprits for pinning this down.