getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.94k stars 1.56k forks source link

inputBaggageString.split is not a function #5250

Closed pepf closed 2 years ago

pepf commented 2 years ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which package are you using?

@sentry/nextjs

SDK Version

7.1.1

Framework Version

-

Link to Sentry event

https://sentry.io/organizations/anyfin/issues/3338863766/

Steps to Reproduce

  1. Upgrade to sentry/nextjs 7.1.1
  2. Have a backend running sentry/javascript using 7.0.0
  3. Trigger some requests.

Expected Result

No errors originating from sentry packgages on http calls.

Actual Result

inputBaggageString.split is not a function originating from: https://github.com/getsentry/sentry-javascript/blob/1456b9c8c62ee414fc4a6075406040fb1238a45e/packages/utils/src/baggage.ts#L74-L76

I verified this error is not triggered on the api or client routes of the nextjs package, but rather on our backend api running nextjs/javascript@7.0.0. However it only happens with requests from this specific "client" (sentry/nextjs@7.1.1).

Other sentry frontend clients connecting to the very same backend do net get this error.

Relevant section of sentry.client.config, as you can see no strange settings.

Sentry.init({
  dsn: SENTRY_DSN,
  enabled: ENVIRONMENT !== "local", // Enable only for prod envs
  // Adjust this value in production, or use tracesSampler for greater control
  tracesSampleRate: isProduction ? 0.5 : 0,
  environment: ENVIRONMENT,
  beforeBreadcrumb: excludeGraphQLFetch, // just a  small util.
lforst commented 2 years ago

Hi, we're looking into this. Sadly we're having difficulties pinpointing where the issue comes from.

If you're able to reproduce this locally, could you try and log the error right before you're rethrowing it? That would be extremely helpful. It would also be helpful to see logs of the sdk right before the error. You can enable logging by setting debug: true in the init function. In order to reproduce it, you probably have to set your tracesSampleRate to 1. Thanks!

pepf commented 2 years ago

Hey! I have a stack trace from when it happened on our remote servers which gives you a good idea of where this happens (the rest of the response structure is graphql format).

stacktrace masked

For now to counter this we have actually disabled tracing on the backend completely, e.g.:

integrations: [
      // enable HTTP calls tracing
      new SentryNode.Integrations.Http({ tracing: false }),

This makes the problem disappear, but we obviously lose some visibility here.

I will see if I can reproduce the problem locally.

lforst commented 2 years ago

Thank you @pepf that actually helps a lot, and confirms my suspicions. We have a bad type cast here: https://github.com/getsentry/sentry-javascript/blob/afe100009abcee883a8175a9de4880b9a04d483e/packages/node/src/integrations/http.ts#L126

requestOptions.headers.baggage is probably not a string in your case and that's why we're getting an error down the line.

It would be good to know what data type slips in there so we can defend against that.

If you're able to reproduce this locally, could you help us out by patching sentry and logging what ends up in headerBaggageString? We could simply add a

console.log(headerBaggageString);

in node_modules/@sentry/node/cjs/integrations/http.js:113.

The mitigation you have in place for now seems good. However, of course, we want to resolve this asap.

Thank you very much btw for debugging this with us!

pepf commented 2 years ago

I've tried reproducing it locally, but so far no luck.

Locally I'm running the services in docker, but in production, things are running in a k8s cloud with some Nginx things in front of it. Given that these "baggage" headers seem to be a standard, would it be possible for elements in the production chain.. (some global NAT, then Nginx, then express.js) to somewhere rewrite these headers in the wrong way?

lforst commented 2 years ago

Thank you very much for the effort!

That header is indeed a standard and we're using to propagate trace state. My current suspicion is that there are two different baggage headers on the incoming request, causing requestOptions.headers.baggage to be an array instead of a string.

I don't believe this is something we account for yet, as it kinda is an edge case (or rather specification violation by some service within the request chain), but we probably should. Baggage propagation in general is currently a big WIP for us.

I am aware you probably don't care about the SDK internals and just want this fixed 😄 - I am just writing my thoughts down so the rest of the team can also pick them up.

Lms24 commented 2 years ago

Hi @pepf I have a couple more questions about this issue.

Can you describe the call chain a little more? If I understand you correctly, you're using Next.js (with @sentry/nextjs@7.1.1) and then on another Node (Express) service, you're using @sentry/node@7.0.0? (You were mentioning "nextjs/javascript@7.0.0" and "sentry/javascript using 7.0.0", can I assume that you meant @sentry/node@7.0.0?)

What I would be interested in, is how the chain of service calls works. When this error happens, are you calling your Node service from your Next.js API? And in between, you have Nginx in your production environment?

I think that @lforst might be correct with the assumption that there are multiple baggage headers in the incoming request (which yes, is in fact a violation of the standard). I also want to make sure that it is not us who cause this.

One last question: Have you tried updating all your Sentry SDKs to 7.1.1 (asking because you were talking about 7.0.0 in the issue description)? I'm just wondering because 7.1.0 contains another baggage change. Generally, we recommend always keeping Sentry SDK versions in sync.

Lms24 commented 2 years ago

A couple of things I found out while trying to reproduce it:

When an incoming request has multiple baggage headers, Express seems to stitch them together into one header (at least with Express 4). So this unlikely to be the problem.

I was able to get the same error when the content of the baggage header is an array rather than a string

e.g. when doing something like

http.get({
    headers: { baggage: ["test=1sf", "my=val"] },
    host: "localhost:3000/test",
  });

This seems wrong to me but apparently, Express and http are fine with it which means it could theoretically happen...

This seems to emerge when a baggage header with this array content is added to the outgoing request before the Sentry SDK adds our baggage header. We have a function for this that takes care of then merging these headers: mergeAndSerializeBaggage. In this function we parse that first baggage header (if present) in parseBaggageString and that's how we get the error thrown (i.e. because an array is passed as an argument that's expected to be a string.

If this is indeed the source of the problem, the fix would be to simply check for the type with Array.isArray and convert the array to string or vice versa.

Gonna work on this a little more to find other potential causes but this is my best guess at the moment.

pepf commented 2 years ago

can I assume that you meant @sentry/node@7.0.0?)

Correct :) So I spoke with a colleague and the reason we didn't upgrade sentry/node to the latest version is this problem started happening right away for all of our clients (some web, some mobile, and some internal services that also have sentry tracing enabled). So if you think about it from that perspective, this issue might be more related to sentry/node 🤔 Since we also have a mobile application as part of our clients, there is no "easy" way to upgrade sentry across all of clients instantly and a gradual rollout would be prefered :)

What I would be interested in, is how the chain of service calls works. When this error happens, are you calling your Node service from your Next.js API? And in between, you have Nginx in your production environment?

We have next.js calling our api endpoints, which uses apollo-server-express to expose a graphql endpoint using express.js. Worht mentioning here might be that we use usage reporting trough apollo, which is "gathering usage reports & traces" in our production setup only, see: https://www.apollographql.com/docs/studio/metrics/usage-reporting/#pushing-metrics-from-apollo-server I can imagine this library interfering with other "trace-like" data.

Lms24 commented 2 years ago

Hi @pepf - we just released version 7.3.0. It contains a potential fix for this reported bug. Would you mind giving this version a try? Would be very much appreciated, thanks!

Lms24 commented 2 years ago

Closing this issue for now as it seems the problem is fixed. Let us know if it still comes up, then we'll re-open it.