getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.94k stars 1.56k forks source link

Incorrect span timing in trace view #12281

Closed purplecandy closed 3 months ago

purplecandy commented 4 months ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/nextjs

SDK Version

7.110.0

Framework Version

NextJS 12

Link to Sentry event

https://gosprinto.sentry.io/performance/trace/9e6329bd84a84c7a81c70e04a36601f3/?fov=0%2C5903.199951171875&node=trace-root&search=intranet&statsPeriod=1h&timestamp=1716992863

SDK Setup

  Sentry.init({
    dsn: SENTRY_DSN,
    integrations: [
      new Sentry.browserTracingIntegration({
        heartbeatInterval: 10000,
        idleTimeout: 5000,
      }),
      new Sentry.Integrations.Breadcrumbs({
        console: false,
      }),
    ],
    tracesSampleRate: 0.8,

    beforeSend: (event, hints) => {
      const location = window?.location?.href
      const area = getAreaFromUrl(location)
      event.tags.area = area
      return event
    },

    ignoreErrors: [
      // Ignore all `ResizeObserver loop limit exceeded` errors
      'ResizeObserver loop limit exceeded',
      'ResizeObserver loop completed with undelivered notifications',
    ],
  })

Steps to Reproduce

I'm trying to understand why some of our traces have a large period. Upon closer inspection, it seems like the span browser-request should represent the correct TTFB metric which is 867ms but in the Trace View, it reports the TTFB is 2.8s.

Also, when you look at the trace view's JSON payload, you can't find this leading time. The event start time and the browser request start time closely match. So, not sure how Sentry is calculating these metrics on the dashboard.

Expected Result

The correct trace view should be like this. image

Actual Result

Now, in the initial 2s, as shown in the screenshot, nothing is taking place and when you look at Sentry Replays, no indicator suggests why there should be this delay. It has completely ruined the aggregated metrics for us because it's not the case with all traces, 30% of the time it captures traces like this with an initial leading time.

image

Lms24 commented 4 months ago

Hi @purplecandy thanks for writing in!

I took a look at the code that's collecting the TTFB measurement and we don't do anything in terms of modifying the value other than taking what the browser reports via its performance API. So In your example, it seems like the browser did indeed report 2.8s for TTFB.

I'm still a bit unsure as to why all spans except for the root pageload and hydration spans are starting with this initial 1.5S delay though. This seems off but I'm not sure what exactly is causing this. @lforst any ideas if this is something NextJS specific?

lforst commented 4 months ago

This looks funky as hell. I think Next.js might be emitting a performance measurement that we pick up. It's not unrealistic that the measurement is faulty and because of that it will skew the entire duration of the transaction. I think the TTFB (wrongly) comes to be at 2796.299ms because we add cache, connect, tls, and request time to first bite PLUS and this is where we get it wrong, the 1768ms leading time from the Next.js measurement.

If we subtract the leading Next.js measurement from the TTFB we get 1010.299ms. If we add all the stuff up that should contribute to TTFB we get 1010.3ms.

So I think the value is just wrong, because of the wrong Next.js performance measurement. We should probably add an allow-list for what measurements we permit. We cannot allow 3rd parties to intervene in the data.


Side-note: The rendering of the TTFB bar also seems to be completely off. ~We may also have rounding errors.~ Seems like a rendering error. Forwarded it to the right team.