getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.95k stars 1.57k forks source link

sentry-trace header is not being attached to Apollo GraphQL requests #2860

Closed dhruska closed 4 years ago

dhruska commented 4 years ago

Package + Version

Version:

5.22.1

Description

It's my understanding according to this documentation that HTTP requests should have a sentry-trace header attached. If I make a fetch() request from my application frontend to my graphql endpoint, the header is attached. However, graphql requests from the Apollo library do not have this header attached.

I have previously manually added a different header for tracing purposes using the explanation provided here, but I'm trying to leverage the Sentry performance monitoring features so I would think the sentry-trace header is required.

lobsterkatie commented 4 years ago

So, to be clear, a manual fetch request to your graphql endpoint has the headers attached but a request made to the same endpoint using Apollo does not have the headers? If you watch the network tab, is Apollo making a fetch request or an XHR?

dhruska commented 4 years ago

Hi @lobsterkatie, that's correct. As far as I can tell when looking at the network tab, Apollo is also using fetch (I don't see any differences at least). Screenshots below. The first screenshot is the manual request that has the sentry-trace header (it's 500ing because I'm not passing a graphql request, just hitting the endpoint), and the second screenshot is the request sent by Apollo. Also, looking at their source, it looks like they use the native browser fetch API as far as I can tell.

If helpful, my integrations setup for this currently is new Integrations.BrowserTracing({ tracingOrigins: ["http://localhost:3001"] }), and my tracesSampleRate is 1.

Screen Shot 2020-08-31 at 1 45 49 PM Screen Shot 2020-08-31 at 1 46 53 PM
rhcarvalho commented 4 years ago

@dhruska could it be that Apollo is being imported/loaded before Sentry in your project? If that's the case, Apollo might be grabbing a reference to the original fetch before the Sentry SDK patches it.

dhruska commented 4 years ago

@rhcarvalho that was absolutely the problem. Thank you!

Another question - now that the header is being passed to my server, I don't think it's being consumed anywhere. Our GraphQL server runs on top of Hapi, so I can't use the built-in Express integration. Is there some manual instrumentation I can add to consume this header in my GraphQL context somewhere?

kamilogorek commented 4 years ago

@dhruska this is the part you are interested in - https://github.com/getsentry/sentry-javascript/blob/4ee0cd2a2cbbc42d1e86fb61666dde80e2a9ed2b/packages/node/src/handlers.ts#L27-L75 it should be rather straightforward to port it to hapi :)

rhcarvalho commented 4 years ago

In addition to borrowing code from the Express integration, we document manual instrumentation at: https://docs.sentry.io/product/performance/getting-started/?platform=node

Closing for now as we've reached a happy ending for the header problem :)

dhruska commented 4 years ago

Amazing - thanks so much!

dhruska commented 4 years ago

I hate to bother you all with one more question, but I'm struggling with this integration a little bit still, presumably because I still have something set up incorrectly. I'm seeing that lots of my graphql requests have the sentry-trace header, but not all. tracesSampleRate is set to 1.

Things I've tried:

rhcarvalho commented 4 years ago

@dhruska we definitely need to improve the documentation in this regard, sorry for the confusion.

For example, I have a page in my app that has five GraphQL calls on page load. The first three calls have the header, and the next two do not. Then, I do another interaction on my page triggering two more GraphQL requests, neither of which have the header. Then I do something to trigger a final request, which does have the header. It seems random. Is there somewhere else I need to specify tracesSampleRate maybe? Any other ideas?

Yes, from your description, it seems that the solution is to increase the idleTimeout property of the BrowserTracing integration.

What is most likely happening here is that both pairs of requests that go out without the sentry-trace header are falling out of any transaction as they happen after the pageload transaction has finished, and before the navigation transaction has started.

The pageload and navigation automatic transactions use an idle time to terminate because otherwise it is unclear when a page is fully loaded (some pages are actually never "fully loaded", think for example of infinite feeds that keep fetching data in the background).

The alternative solution, if you want full control of start and finish of transactions, is to disable the automatic instrumentation and use manual instrumentation. That is only really a better solution if you have a way to tell precisely when a transaction starts and when it ends.

Looking forward to hearing a success report from you :)

dhruska commented 4 years ago

@rhcarvalho I really appreciate you taking the time to break this down for me. It seems that that is definitely what's happening here. Our application is by and large a single page app. The few interactions that seemed to randomly successfully have the header attached occurred right after the query params in the URL changed, so I imagine that was being registered as a new navigation transaction, restarting the timeout.

Users might spend many minutes in our application, triggering actions but not doing any navigation. Is there any downside to setting idleTimeout to some very large number, like 1 hour? From what I can tell every interaction would then be captured under a single transaction, which probably also is not ideal. Maybe manual instrumentation is the way to go here, and we could create a new transaction for every GraphQL request.

rhcarvalho commented 4 years ago

Users might spend many minutes in our application, triggering actions but not doing any navigation. Is there any downside to setting idleTimeout to some very large number, like 1 hour?

Yes, there is. If a user closes a page earlier than the "idle timeout" (which means more time than the same time on a stopwatch), the transaction is lost.

With 1h idle timeout you're most likely going to miss a lot of transactions.

From what I can tell every interaction would then be captured under a single transaction, which probably also is not ideal. Maybe manual instrumentation is the way to go here, and we could create a new transaction for every GraphQL request.

From the brief description of your app, I agree that trying to fit every interaction in a single pageload/navigation transaction is not a good idea, as well as having a new transaction for each GraphQL request doesn't sound quite right, as probably that has no useful meaning to you.

Do you happen to control the backend server as well? To look at endpoint performance, it's best to instrument on the server -- instrumenting on the client adds all sorts of variability to your numbers (for example, client network latencies, etc).

My recommendation:

  1. Use automatic instrumentation on the frontend for pageload/navigation (probably with default settings or a small tweak).
  2. If possible, instrument the backend.
  3. If you have any other semantically relevant activity in the frontend that you want to measure, use manual instrumentation. For example, suppose you want to measure how long it takes for your users to click a certain button in your page -- create a manual transaction to measure that.
dhruska commented 4 years ago

This is helpful, thanks. We do control the backend and are instrumenting there as well - we copied the logic linked here to our backend implementation to instrument the backend and consume the sentry-trace header where appropriate, since we were trying to take advantage of the tracing capabilities where we could. We adapted the Express implementation to Hapi - would definitely be happy to copy our integration as a Hapi plugin in a pull request against this repo, if there is any interest.

Sounds like the best happy medium for us is to give a minor increase to the idleTimeout property as you described to capture all the requests on page load, and to just accept that not all requests will have the trace header, if the user interacts with the application after pageload without navigating, which is ok since we're capturing everything on the backend already - and if we determine we need more metrics we can use manual instrumentation as you described.

Really appreciate you breaking down how all of this works! Super helpful 😃

edzis commented 3 years ago

With 1h idle timeout you're most likely going to miss a lot of transactions.

I have a similar situation and am considering to use onbeforeunload to close any open transactions. Wouldn't that cover most of the potential misses with long timeouts after the last navigation?

revmischa commented 1 year ago

@dhruska could it be that Apollo is being imported/loaded before Sentry in your project? If that's the case, Apollo might be grabbing a reference to the original fetch before the Sentry SDK patches it.

I might be seeing this issue, but I'm using @sentry/next which performs some magic. I create my Apollo client in _app.tsx. How can I make sure @sentry/next is configuring BrowserTracing first?

mmahalwy commented 8 months ago

Have same problem @revmischa – any solution here?