getsentry / sentry-go

The official Go SDK for Sentry (sentry.io)
https://docs.sentry.io/platforms/go/
MIT License
909 stars 212 forks source link

Continuing transaction/spans from an async queue, missing parent tx #858

Closed MattJustMatt closed 3 months ago

MattJustMatt commented 3 months ago

Summary

I have a service that processes a Redis queue. The traceId is pulled from the queue data and I'd like to attach these spans coming from an async processing step to the root tx from an upstream service.

I tried following the guide here: https://docs.sentry.io/platforms/go/tracing/instrumentation/custom-instrumentation/

I see the root transaction show up in Sentry along with some other correlated spans, but only one of several spans from this service show up in the Sentry dashboard AND they are labeled as " < unlabeled transaction > " when they do.

Figuring I'm mucking something up with ctx or otherwise, but having trouble figuring out why it doesn't seem to continue the trace.

Steps To Reproduce

Pass in a traceId set like 3c4b45444c084ce3b0025fd9d16e6abf-99b906e1f314c2c7-1 to span options outside of any http handlers etc...

            var span *sentry.Span
            if traceId != "" {
                gameCtx = sentry.SetHubOnContext(gameCtx, sentry.CurrentHub().Clone())
                log.Printf("Using traceId %s for player %s", traceId, playerID)
                options := []sentry.SpanOption{
                    sentry.WithOpName("Galactus Queue"),
                    sentry.ContinueFromTrace(traceId),
                }

                transaction := sentry.StartTransaction(gameCtx, fmt.Sprintf("Galactus Queue - %s", region), options...)
                defer transaction.Finish()
                span = sentry.StartSpan(gameCtx, "get-game-info")
            }

            gameInfo, err := aomongo.GetGameInfo(gameID)
            if err != nil {
                log.Printf("[PROCESS QUEUE - %s] Error fetching game info for game %s: %v", region, gameID, err)
                sentry.CaptureException(fmt.Errorf("failed to fetch game info for game %s: %s", gameID, err))
                continue
            }

            if span != nil {
                span.Finish()
                sentry.StartSpan(gameCtx, "fcall-process-queued-player")
            }

Expected Behavior

The transaction/spans would be associated with the set from the traceId being passed in

Sentry.io Event

https://all-out-fb.sentry.io/performance/trace/6b356faf8a054d97d433946e70483d07/?fov=0.005223210642291684%2C1&node=txn-ce4199e16a5c480ca43eefd102e26d29&project=4506334475649024&source=traces&statsPeriod=1h&timestamp=1721086740

Service Logs

2024-07-15 16:39:00.210 | [Sentry] 2024/07/15 23:39:00 Skipping transaction profiling: ProfilesSampleRate is: 0.000000 |  
-- | -- | --
  |   | 2024-07-15 16:39:00.209 | [Sentry] 2024/07/15 23:39:00 Skipping transaction profiling: ProfilesSampleRate is: 0.000000 |  
  |   | 2024-07-15 16:39:00.209 | [Sentry] 2024/07/15 23:39:00 Sending transaction [ce4199e16a5c480ca43eefd102e26d29] to o4505228622036992.ingest.sentry.io project: 4506334475649024 |  
  |   | 2024-07-15 16:39:00.208 | [Sentry] 2024/07/15 23:39:00 Sending transaction [ea36b10e538c4657a3780e831ae43687] to o4505228622036992.ingest.sentry.io project: 4506334475649024 |  
  |   | 2024-07-15 16:39:00.207 | 2024/07/15 23:39:00 [JOIN REQ] Finished /matchmake handler for player 6580bccd9883d7e53c069e22 (matt) game 654eedec2412b8afb8a993ca to region (auto) |  
  |   | 2024-07-15 16:39:00.207 | [Sentry] 2024/07/15 23:39:00 Skipping transaction profiling: ProfilesSampleRate is: 0.000000 |  
  |   | 2024-07-15 16:39:00.207 | 2024/07/15 23:39:00 Using traceId 3c4b45444c084ce3b0025fd9d16e6abf-99b906e1f314c2c7-1 for player 6580bccd9883d7e53c069e22

The traceId coming down looks correct and the last bit is 1 to indicate that it should be sampled

Environment

Linux VM in GCP with Go process running directly

SDK

Sentry

cleptric commented 3 months ago

I agree that the docs could be a bit confusing on this matter, but you need to pass transaction.Context() into your startSpan() calls. As the context is passed by value, it won't get mutated.

Screenshot 2024-07-16 at 02 29 03

A different option would be to use span.StartChild(operation, options...) which is the same as StartSpan(span.Context(), operation, options...).

MattJustMatt commented 3 months ago

@cleptric thanks for taking a look! I made this change, but now no spans/txs are showing up from this path.

I'm guessing there is something wrong with my tx options or how I'm resuming the trace? I think the first span showed up because it was decoupled from the "broken transaction" but now that all the spans are correctly parented none are showing

cleptric commented 3 months ago

What does the debug output say? In your example above, you're also not finishing the last span.

MattJustMatt commented 3 months ago

Logs tell all! Even though that particular span was being finished later, the tx.finish being deferred wouldn't work in this case since the fn never goes out of scope (it's a in a loop reading from the queue...) good now!

Thanks for taking a look!