DataDog / dd-trace-go

Datadog Go Library including APM tracing, profiling, and security monitoring.
https://docs.datadoghq.com/tracing/
Other
610 stars 419 forks source link

[BUG] Data race in tracer/payload.go #2631

Open jared-mackey opened 3 months ago

jared-mackey commented 3 months ago

Version of dd-trace-go v1.61.0

Describe what happened: It appears that there is a data race in reporting traces to DD. Might be related to #330. However, we are seeing it in our staging environment where we have data race detection enabled and sending traces to real DD servers.

Apologies for the poor formatting. Got it from a CSV export from DD logs.

==================
WARNING: DATA RACE
Read at 0x00c002f35a58 by goroutine 644368:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).Read()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/payload.go:144 +0x72
  io.copyBuffer()
  /usr/local/go/src/io/io.go:429 +0x29a
  io.CopyBuffer()
  /usr/local/go/src/io/io.go:402 +0x8f
  net/http.(*transferWriter).doBodyCopy()
  /usr/local/go/src/net/http/transfer.go:416 +0x144
  net/http.(*transferWriter).writeBody()
  /usr/local/go/src/net/http/transfer.go:360 +0xee9
  net/http.(*Request).write()
  /usr/local/go/src/net/http/request.go:755 +0x1413
  net/http.(*persistConn).writeLoop()
  /usr/local/go/src/net/http/transport.go:2447 +0x379
  net/http.(*Transport).dialConn.gowrap3()
  /usr/local/go/src/net/http/transport.go:1800 +0x33

Previous write at 0x00c002f35a58 by goroutine 644622:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).updateHeader()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/payload.go:125 +0x39b
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).reset()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/payload.go:99 +0x2e
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.func1()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/writer.go:121 +0x204
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.gowrap1()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/writer.go:126 +0x41

Goroutine 644368 (running) created at:
  net/http.(*Transport).dialConn()
  /usr/local/go/src/net/http/transport.go:1800 +0x27fe
  net/http.(*Transport).dialConnFor()
  /usr/local/go/src/net/http/transport.go:1485 +0x124
  net/http.(*Transport).queueForDial.gowrap1()
  /usr/local/go/src/net/http/transport.go:1449 +0x44

Goroutine 644622 (running) created at:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/writer.go:91 +0x2e4
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).worker()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/tracer.go:367 +0x284
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func2()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/tracer.go:318 +0x16d

Some other logs that are relevant

Mar 22 07:13:14.692 Datadog Tracer v1.61.0 ERROR: lost 2 traces: timeout
Mar 22 07:13:14.692 (Status: Request Timeout) (occurred: 22 Mar 24 13:12 UTC)
Mar 22 07:13:14.692 Datadog Tracer v1.61.0 ERROR: failure sending traces (attempt 1), will retry: timeout
Mar 22 07:13:14.692 (Status: Request Timeout) (occurred: 22 Mar 24 13:12 UTC)

Describe what you expected: No data races :)

Additional environment details (Version of Go, Operating System, etc.): Golang 1.22.1 Binaries compiled with -race Running inside EKS on x86 instances Agent (“DD servers”) running as a daemonset. Should be a local pod to this pod.

darccio commented 3 months ago

@jared-mackey Thanks for reaching out! We are going to review it.

SvenGasterstaedt commented 2 months ago

We are also experiencing some data races regarding tracer/payload.go

this is similar to the one mentioned above (but read/write are switched)

WARNING: DATA RACE
Write at 0x00c00197fc98 by goroutine 95651:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).updateHeader()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:129 +0x1c8
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).reset()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:99 +0x33
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.func1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:121 +0x93e
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.gowrap1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:126 +0x46

Previous read at 0x00c00197fc98 by goroutine 93651:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).Read()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:144 +0x9e
  io.copyBuffer()
      /usr/local/go/src/io/io.go:429 +0x596
  io.CopyBuffer()
      /usr/local/go/src/io/io.go:402 +0x145
  net/http.(*transferWriter).doBodyCopy()
      /usr/local/go/src/net/http/transfer.go:416 +0x247
  net/http.(*transferWriter).writeBody()
      /usr/local/go/src/net/http/transfer.go:360 +0x530
  net/http.(*Request).write()
      /usr/local/go/src/net/http/request.go:755 +0x2445
  net/http.(*persistConn).writeLoop()
      /usr/local/go/src/net/http/transport.go:2447 +0x478
  net/http.(*Transport).dialConn.gowrap3()
      /usr/local/go/src/net/http/transport.go:1800 +0x38

Goroutine 95651 (running) created at:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:91 +0x32c
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).worker()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:379 +0x6ad
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func2()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:326 +0x2af

Goroutine 93651 (running) created at:
  net/http.(*Transport).dialConn()
      /usr/local/go/src/net/http/transport.go:1800 +0x45d7
  net/http.(*Transport).dialConnFor()
      /usr/local/go/src/net/http/transport.go:1485 +0x195
  net/http.(*Transport).queueForDial.gowrap1()
      /usr/local/go/src/net/http/transport.go:1449 +0x50

and this is one data race regarding the reset of the payload:

WARNING: DATA RACE
Write at 0x00c000a626e0 by goroutine 95651:
  bytes.(*Reader).Seek()
      /usr/local/go/src/bytes/reader.go:117 +0x68
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).reset()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:101 +0x9a
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.func1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:121 +0x93e
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.gowrap1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:126 +0x46

Previous write at 0x00c000a626e0 by goroutine 93651:
  bytes.(*Reader).Read()
      /usr/local/go/src/bytes/reader.go:43 +0x132
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).Read()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:153 +0x3c5
  io.copyBuffer()
      /usr/local/go/src/io/io.go:429 +0x596
  io.CopyBuffer()
      /usr/local/go/src/io/io.go:402 +0x145
  net/http.(*transferWriter).doBodyCopy()
      /usr/local/go/src/net/http/transfer.go:416 +0x247
  net/http.(*transferWriter).writeBody()
      /usr/local/go/src/net/http/transfer.go:360 +0x530
  net/http.(*Request).write()
      /usr/local/go/src/net/http/request.go:755 +0x2445
  net/http.(*persistConn).writeLoop()
      /usr/local/go/src/net/http/transport.go:2447 +0x478
  net/http.(*Transport).dialConn.gowrap3()
      /usr/local/go/src/net/http/transport.go:1800 +0x38

Goroutine 95651 (running) created at:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:91 +0x32c
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).worker()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:379 +0x6ad
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func2()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:326 +0x2af

Goroutine 93651 (running) created at:
  net/http.(*Transport).dialConn()
      /usr/local/go/src/net/http/transport.go:1800 +0x45d7
  net/http.(*Transport).dialConnFor()
      /usr/local/go/src/net/http/transport.go:1485 +0x195
  net/http.(*Transport).queueForDial.gowrap1()
      /usr/local/go/src/net/http/transport.go:1449 +0x50