getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.98k stars 1.57k forks source link

Using Sentry breadcrumbs and tags alongside OpenTelemetry isn't working correctly #7538

Closed seeARMS closed 4 months ago

seeARMS commented 1 year ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using? If you use the CDN bundles, please specify the exact bundle (e.g. bundle.tracing.min.js) in your SDK setup.

@sentry/node

SDK Version

7.43.0

Framework Version

No response

Link to Sentry event

https://paragraph.sentry.io/performance/node:08b586a3e77243548a96ac02db296d0f/?project=6606495&query=&statsPeriod=5m&transaction=POST+%2Fmetrics%2Fpost-track

SDK Setup

Sentry.init({
  dsn: "MY_DSN",

  enabled: isProduction,

  instrumenter: "otel",

  release: process.env.COMMIT,
  environment: process.env.FLY_APP_NAME || "development",

  tracesSampler: (samplingContext) => {
    // Drop transactions for routes we don't care about
    for (const route of SENTRY_IGNORED_ROUTES) {
      if (samplingContext?.request?.url?.endsWith(route)) return 0
    }

    // Drop transactions for non-production environments. (We still care about errors
    // on non-production environments, but we don't need to see all the transactions.)
    if (process.env.FLY_APP_NAME !== "paragraph-master") return 0

    return TRACE_SAMPLE_PERCENTAGE
  },

  profilesSampleRate: 1.0, // Profiling sample rate is relative to tracesSampleRate

  beforeSend: (event) => {
    if (!event.request?.data) return event

    const size = JSON.stringify(event.request?.data || "").length

    // Arbitrarily cut off massive payloads. Sentry silently drops events that are too large,
    // so trim them.
    // See https://github.com/getsentry/sentry-javascript/issues/3036#issuecomment-1066491190
    if (size > 500000) {
      event.request = event.request?.data?.slice(0, 500000)
    }

    return event
  },

  integrations: [new ProfilingIntegration()],
})

// Only load OpenTelemetry in production,
// so we aren't spamming Sentry with test data.
if (isProduction) {
  const sdk = new opentelemetry.NodeSDK({
    /*
    sampler: new opentelemetry.tracing.TraceIdRatioBasedSampler(
      TRACE_SAMPLE_PERCENTAGE
    ),
    */
    // Existing config
    traceExporter: new OTLPTraceExporter(),
    instrumentations: [
      new HttpInstrumentation({
        ignoreIncomingRequestHook(req) {
          for (const route of SENTRY_IGNORED_ROUTES) {
            if (req.url.endsWith(route)) {
              return true
            }
          }
          return false
        },
      }),

      new ExpressInstrumentation(),
      new GrpcInstrumentation({
        ignoreGrpcMethods: [
          "google.logging.v2.LoggingServiceV2/WriteLogEntries",
          new RegExp(".*google.logging.*"),
          new RegExp(".*WriteLogEntries*"),
          (name) => {
            return name.startsWith("google.logging")
          },
        ],
      }),
    ],

    // Sentry config
    spanProcessor: new SentrySpanProcessor(),
  })

  otelApi.propagation.setGlobalPropagator(new SentryPropagator())

  sdk.start()
}

Steps to Reproduce

I've been happily using Sentry for some time. using @sentry/node with Express.

Today we switched from using Sentry tracing to OpenTelemetry-based tracing, using @sentry/opentelemetry-node. Transactions are getting logged fine, but it appears that breadcrumbs and tags are not correctly added to the transaction.

Viewing any Sentry event (eg the one linked above) has tags from what seems like other requests. Additionally, the breadcrumbs also don't correspond to this request.

There's a chance I'm doing something incorrect in Sentry.init() above. For example, I don't need to use any Sentry tracing functionality now that I have OpenTelemtry tracing working, right? The documentation here is a bit light so it's unclear to me.

Additionally, I'm getting an exception on startup (see logs below).

Expected Result

I'd expect to be able to use Sentry.addBreadcrumb and Sentry.addTag throughout the codebase, and it would add the breadcrumb and tag to the current transaction/span created by Opentelemetry.

Actual Result

I get this non-fatal exception on startup:

2023-03-20T19:21:52Z app[53d5f709] iad [info]Error: @opentelemetry/api: Attempted duplicate registration of API: propagation
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at registerGlobal (/workspace/node_modules/@opentelemetry/api/build/src/internal/global-utils.js:32:21)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at PropagationAPI.setGlobalPropagator (/workspace/node_modules/@opentelemetry/api/build/src/api/propagation.js:52:50)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at NodeTracerProvider.register (/workspace/node_modules/@opentelemetry/sdk-trace-base/build/src/BasicTracerProvider.js:100:31)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at NodeTracerProvider.register (/workspace/node_modules/@opentelemetry/sdk-trace-node/build/src/NodeTracerProvider.js:43:15)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at NodeSDK.start (/workspace/node_modules/@opentelemetry/sdk-node/build/src/sdk.js:158:24)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at Object.<anonymous> (/workspace/packages/api/dist/trace.cjs:127:7)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at Module._compile (node:internal/modules/cjs/loader:1159:14)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at Module.load (node:internal/modules/cjs/loader:1037:32)
2023-03-20T19:21:52Z app[53d5f709] iad [info]    at Module._load (node:internal/modules/cjs/loader:878:12)

If I replace the NodeSDK init with the following, the error goes away, but I don't know if it's actually using SentryPropagator now:

  const sdk = new opentelemetry.NodeSDK({
    textMapPropagator: new SentryPropagator(),
... 
})
AbhiPrasad commented 1 year ago

Hey @seeARMS, thanks for writing in!

To isolate your breadcrumbs/tags, you need to use our express request handler!

As per the docs,

// The request handler must be the first middleware on the app
app.use(Sentry.Handlers.requestHandler());

This should make sure sentry data is isolated.

I get this non-fatal exception on startup:

Yes, we need to adjust our docs for this, using dependency injected propagator is more correct than the global propagation API. Will fix!

AbhiPrasad commented 1 year ago

Created https://github.com/getsentry/sentry-docs/pull/6501 and https://github.com/getsentry/sentry-javascript/pull/7548 to update the docs!

seeARMS commented 1 year ago

Thanks for the help!

I gave that a try and now no breadcrumbs or tags are getting logged.

Transaction after the change, with no tags/breadcrumbs: https://paragraph.sentry.io/performance/node:7cb35584c76c4a598319153e8ad6b3aa/?project=6606495&query=&referrer=performance-transaction-summary&statsPeriod=2m&transaction=GET+%2Fblogs%2FxeXAY7C8qmb1h6Cp8zTa%2Fcommunities&unselectedSeries=p100%28%29

Same transaction but different event which occurred yesterday, with the erroneous tags/breadcrumbs as described above: https://paragraph.sentry.io/performance/node:a87a3b8398ee42e39c33dc5b9b1955f5/?project=6606495&query=&referrer=performance-transaction-summary&showTransactions=recent&statsPeriod=24h&transaction=GET+%2Fblogs%2FxeXAY7C8qmb1h6Cp8zTa%2Fcommunities&unselectedSeries=p100%28%29

Do I need to add the tracing integration (new Tracing.Integrations.Express({ app }) to Sentry.init as well as the tracing handler middleware (app.use(Sentry.Handlers.tracingHandler()))? Or is this not needed since I'm now using OpenTelemetry?

AbhiPrasad commented 1 year ago

You shouldn't need to use the Express integration nor the tracingHandler, since we are relying completely on OpenTelemetry to generate the spans.

The Sentry.Handlers.requestHandler is needed, because it's isolating the context between your requests. I don't understand why this is not working though 🤔. Is the requestHandler the first express middleware getting registered?

Need to reproduce this, will try tomorrow! If you could provide a quick repro, would be a huge help as well!

Also 👋 to fellow uwaterloo grad ^^

seeARMS commented 1 year ago

Here's a repro: https://github.com/seeARMS/sentry-javascript-7538-reproduction

Clone it, run npm install, then npm run start. It successfully instantiates OpenTelemetry and Sentry, and the transaction gets created successfully when the / route is hit, but the single breadcrumb never gets displayed in Sentry.

Here's an event in question that was logged using the repro (no breadcrumb present): https://paragraph.sentry.io/performance/node:f116399485744a8ab2452906b6d39fee/?environment=sentry-reproduction&project=6606495&query=&referrer=performance-transaction-summary&statsPeriod=2m&transaction=GET+%2F&unselectedSeries=p100%28%29

And hello! Always great to see other uWaterloo grads in the wild 😄

AbhiPrasad commented 1 year ago

Hey @seeARMS thank you so much for the reproduction! Took me a while to debug, but the issue is due to inconsistency between how we isolate requests and how opentelemetry isolates requests. I opened a PR to fix this: https://github.com/getsentry/sentry-javascript/pull/7577

Thank you for your patience while we figured it out - this will be part of the next release of the SDK (going out end of week).

seeARMS commented 1 year ago

Nice! Thanks for the quick turn-around; really appreciate it!

seeARMS commented 1 year ago

@AbhiPrasad, this seems to be happening again - did anything change in Sentry or OpenTelemtry?

A given transaction seems to have a lot of tags, breadcrumbs, and console.logs from other requests.

Eg: https://paragraph.sentry.io/issues/4018310436/?project=6606495&query=is%3Aunresolved&referrer=issue-stream&stream_index=1

Using @sentry/node and @sentry/opentelemtry-node version 7.61.1.

AbhiPrasad commented 1 year ago

Hey, apologies for the trouble - let's try to figure out what's happening.

Looking at https://github.com/getsentry/sentry-javascript/commits/develop/packages/opentelemetry-node nothing must have changed for the top level otel integration, but maybe it was a regression with the core functionality or otel itself.

Did you upgrade your node version since then? Also could you share your otel version?

A reproduction would help sped up the investigation. I don't have a ton of bandwidth next couple days, @mydea could you help take a look?

seeARMS commented 1 year ago

I cloned my existing repo above (https://github.com/seeARMS/sentry-javascript-7538-reproduction), updated all dependencies to the latest using npm-check-updates then ncu --upgrade, and the / route actually isn't even getting logged anymore.

Using the old dependencies (eg cloning the repo above as-is) shows Sentry working as intended:

➜ npm run start
npm WARN cli npm v9.8.1 does not support Node.js v17.8.0. This version of npm supports the following node versions: `^14.17.0 || ^16.13.0 || >=18.0.0`. You can find the latest version at https://nodejs.org/.

> start
> node --require ./tracing.js app.js

Sentry Logger [log]: Integration installed: InboundFilters
Sentry Logger [log]: Integration installed: FunctionToString
Sentry Logger [log]: Integration installed: Console
Sentry Logger [log]: HTTP Integration is skipped because of instrumenter configuration.
Sentry Logger [log]: Integration installed: Http
Sentry Logger [log]: Integration installed: Undici
Sentry Logger [log]: Integration installed: OnUncaughtException
Sentry Logger [log]: Integration installed: OnUnhandledRejection
Sentry Logger [log]: Integration installed: ContextLines
Sentry Logger [log]: Integration installed: LocalVariables
Sentry Logger [log]: Integration installed: Context
Sentry Logger [log]: Integration installed: Modules
Sentry Logger [log]: Integration installed: RequestData
Sentry Logger [log]: Integration installed: LinkedErrors
Sentry Logger [log]: Integration installed: ProfilingIntegration
http server listening on port 8085
Sentry Logger [log]: [Tracing] starting undefined transaction - GET
Sentry Logger [log]: [Profiling] started profiling transaction: GET
Sentry Logger [log]: [Tracing] Starting '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Sentry Logger [log]: [Tracing] Finishing '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Sentry Logger [log]: [Tracing] Starting '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Sentry Logger [log]: [Tracing] Finishing '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Sentry Logger [log]: [Tracing] Starting '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Sentry Logger [log]: [Tracing] Finishing '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Sentry Logger [log]: [Tracing] Starting '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Sentry Logger [log]: [Tracing] Finishing '< unknown op >' span on transaction 'GET' (9cfa7e02b183fb92).
Triggered route
Sentry Logger [log]: [Profiling] stopped profiling of transaction: GET /
Sentry Logger [log]: [Tracing] Finishing http.server transaction: GET /.
Sentry Logger [log]: [Tracing] starting undefined transaction - POST
Sentry Logger [log]: [Profiling] started profiling transaction: POST

After upgrading all deps to the latest, no Sentry debug lines showing that tracing or profiling is working:

➜ npm run start

> start
> node --require ./tracing.js app.js

Sentry Logger [log]: Integration installed: InboundFilters
Sentry Logger [log]: Integration installed: FunctionToString
Sentry Logger [log]: Integration installed: Console
Sentry Logger [log]: HTTP Integration is skipped because of instrumenter configuration.
Sentry Logger [log]: Integration installed: Http
Sentry Logger [log]: Integration installed: Undici
Sentry Logger [log]: Integration installed: OnUncaughtException
Sentry Logger [log]: Integration installed: OnUnhandledRejection
Sentry Logger [log]: Integration installed: ContextLines
Sentry Logger [log]: Integration installed: LocalVariables
Sentry Logger [log]: Integration installed: Context
Sentry Logger [log]: Integration installed: Modules
Sentry Logger [log]: Integration installed: RequestData
Sentry Logger [log]: Integration installed: LinkedErrors
Sentry Logger [log]: Integration installed: ProfilingIntegration
http server listening on port 8085
Triggered route

Here are my OTEL versions of the prod install:

    "@opentelemetry/api": "^1.4.1",
    "@opentelemetry/instrumentation-bunyan": "^0.31.1",
    "@opentelemetry/instrumentation-fs": "^0.7.1",
    "@opentelemetry/instrumentation-grpc": "^0.36.0",
    "@opentelemetry/instrumentation-http": "^0.36.0",
    "@opentelemetry/sdk-node": "^0.36.1",
    "@sentry/opentelemetry-node": "^7.61.1",
    "@sentry/profiling-node": "^1.1.2",
    "@sentry/tracing": "^7.61.1",
    "opentelemetry-instrumentation-express": "^0.39.1",

I don't believe I've upgraded Node.

mydea commented 1 year ago

Thank you for the repro! I digged into it, and it seems the problem is @opentelemetryinstrumentation-grpc v0.41.2 - it works when pinned to v0.41.1.

It works for me with these dependencies:

"@opentelemetry/api": "1.4.1",
"@opentelemetry/core": "1.15.2",
"@opentelemetry/exporter-trace-otlp-grpc": "0.41.2",
"@opentelemetry/instrumentation-express": "0.33.0",
"@opentelemetry/instrumentation-grpc": "0.41.1",
"@opentelemetry/instrumentation-http": "0.41.2",
"@opentelemetry/sdk-node": "0.41.2",
"@opentelemetry/semantic-conventions": "1.15.2",
"@sentry/node": "7.62.0",
"@sentry/opentelemetry-node": "7.62.0",
"@sentry/profiling-node": "1.1.2",

I need to dig a bit further, but there are two relevant changes in 0.41.1 --> 0.41.2:

(just looking at the "not capturing at all" issue right now, not yet into breadcrumbs)

mydea commented 1 year ago

Some further digging, I opened an issue for the grpc instrumentation here: https://github.com/open-telemetry/opentelemetry-js/issues/4053

seeARMS commented 1 year ago

Thanks! Saw your comments. Upgraded all @sentry package to 7.65.0 and confirmed the GrpcInstrumentation is instantiated after the HTTPinstrumentation and tags/breadcrumbs are still not getting logged correctly.

mydea commented 1 year ago

Thanks! Saw your comments. Upgraded all @sentry package to 7.65.0 and confirmed the GrpcInstrumentation is instantiated after the HTTPinstrumentation and tags/breadcrumbs are still not getting logged correctly.

Can you share some more details on what is not happening correctly:

It would also be helpful if you are on SaaS if you could share the link of one or two events/transactions that have incorrect data added to them!

seeARMS commented 1 year ago

Sentry seems to be combining the breadcrumbs and tags from multiple transactions into a single transaction.

For example, consider this error. The breadcrumbs attached to that event are from many different requests - not only the request that led to that error. (You can see this via the "Sending custom blog email" and "Sending newsletter email" breadcrumbs - those are triggered by distinct separate requests, completely unrelated to the error that was sent to Sentry). The same thing is happening with the tags - that error has a large number of tags added to it, but only a subset of which correspond to the actual Sentry.setTags call that happened on the request that triggered the error.

Two more events where this is happening:

I'll try to see if I can put together a reproduction, similar to what I did earlier.

EDIT: Actually, do I need to do anything with this? This seems to be exactly what I'm encountering, and we use async functions all over the codebase: https://docs.sentry.io/platforms/node/configuration/async-context/?utm_source=bestasotools&utm_var=hp-take-over. Also found https://github.com/getsentry/sentry-javascript/issues/4071 which looks, again, identical to what's happening here. Wondering if OTEL was ever tested in conjunction with the async context fixes?

mydea commented 1 year ago

So I think I see the problem there. Yes, you still need to setup Sentry to have correct async context. If you use express, you can use the provided request handler:

import * as Sentry from '@sentry/node';

const app = express();

app.use(Sentry.Handlers.requestHandler());

If you are using something else, you'll have to do that yourself based on the docs you linked (https://docs.sentry.io/platforms/node/configuration/async-context/?utm_source=bestasotools&utm_var=hp-take-over).

Basically, while OTEL keeps it's own context based on async context, Sentry errors do not know about this, so you'll still have to tell Sentry how/when to create an async context as well.

Can you give this a try and let us know if that still does not resolve the issue?

seeARMS commented 1 year ago

I've actually always had that handler present.

This command is what starts the server:

"start-server": "node -r dotenv-flow/config --async-stack-traces --experimental-modules --es-module-specifier-resolution=node dist/trace.cjs"

And here is trace.cjs:

// This file NEEDS to be CommonJS. Opentelemetry does not work with
// ESM modules. So, we use commonJS and import the app file
// at the bottom.
const Sentry = require("@sentry/node")
const {
  SentrySpanProcessor,
  SentryPropagator,
} = require("@sentry/opentelemetry-node")

const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http")
const { GrpcInstrumentation } = require("@opentelemetry/instrumentation-grpc")
const opentelemetry = require("@opentelemetry/sdk-node")
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-grpc")
const { ProfilingIntegration } = require("@sentry/profiling-node")

const {
  ExpressInstrumentation,
} = require("opentelemetry-instrumentation-express")

Sentry.init({
  dsn: "<snip>",

  // Disable entirely on non-production environments.
  enabled: isProduction,
  debug: process.env.SENTRY_DEBUGGING === "true",

  instrumenter: "otel",

  release: process.env.COMMIT,
  environment: process.env.FLY_APP_NAME || "development",

  // We need to use Sentry trace sampling, and NOT opentelemetry,
  // so that tags are properly set on the transaction.
  tracesSampler: (samplingContext) => {
    return 0.03
  },

  profilesSampleRate: 1.0, // Profiling sample rate is relative to tracesSampleRate

  integrations: [
    new Sentry.Integrations.OnUncaughtException({
      // We handle our own exceptions.
      exitEvenIfOtherHandlersAreRegistered: false,
    }),
    new ProfilingIntegration(),
  ],
})

  const sdk = new opentelemetry.NodeSDK({
    textMapPropagator: new SentryPropagator(),

    // Existing config
    traceExporter: new OTLPTraceExporter(),
    instrumentations: [
      new ExpressInstrumentation({
        includeHttpAttributes: true,
        // requestHook: (span, data) => {
        //  console.log("requestHook", span, data)
        //  },
      }),
      new HttpInstrumentation(),

      new GrpcInstrumentation({
        ignoreGrpcMethods: [
          "google.logging.v2.LoggingServiceV2/WriteLogEntries",
          new RegExp(".*google.logging.*"),
          new RegExp(".*WriteLogEntries*"),
          (name) => {
            return name.startsWith("google.logging")
          },
        ],
      }),
    ],

    // Sentry config
    spanProcessor: new SentrySpanProcessor(),
  })

  sdk.start()

// For some reason we need to require express here, so it's
// patched correctly. If we don't do this and only rely on the
// express in app.ts, the route-level instrumentation does not work.
require("express")

import("./app")

And the app.ts file:

import f from "dotenv-flow"
f.config()

import express from "express"
import cookieParser from "cookie-parser"
import morgan from "morgan"

import { API_HOST } from "./constants"
import SentryLogger from "./util/log"

import * as Sentry from "@sentry/node"

import lb from "@google-cloud/logging-bunyan"

import cors from "cors"

import {
  optionalVerifyToken,
  verifyToken,
  impersonationMiddleware,
} from "./middleware/auth"

import { db } from "./db/db"
import Logger from "bunyan"

import initialize from "./util/amplitude"
initialize()

const app = express()

// The request handler must be the first middleware on the app
app.use(Sentry.Handlers.requestHandler())

app.use(cors({ origin: true, credentials: true }))
app.options("*", cors())

  const { logger, mw } = await lb.express.middleware({
    logName: "api-" + process.env.FLY_APP_NAME,
    level: "debug",
  })

  // Log to Sentry as well.
  logger.addStream({
    type: "raw",
    stream: new SentryLogger(),
    closeOnExit: false,
    level: "debug",
  })

  app.use(mw)

// Other routes

And, the SentryLogger which actually adds the breadcrumbs and tags:

export default class ErrorReporting {
  public write(record: BunyanLogRecord) {
    const level = mapLevelToSeverity(record.level)

    const tags = filterObject(record)

    if (tags) {
      Sentry.setTags(tags)
    }

    // Everything except fatal, just log a breadcrumb
    if (!["error", "fatal"].includes(level) && record.msg) {
      Sentry.addBreadcrumb({
        level: mapLevelToSeverity(record.level),
        message: record.msg,
      })
      return
    }

    console.log("Reporting Sentry error...")
    console.log(record)

    if (record.err) {
      Sentry.setExtra("error", record.err)
      if (record.err instanceof Error) {
        Sentry.captureException(record.err)
      }
    }

    if (record.msg) {
      Sentry.captureMessage(record.msg, "error")
    }

    Sentry.flush(10000)
  }
}

Let me know if anything looks off here. Do I need to add the runWithAsyncContext call to somehow wrap the Bunyan logger class?

mydea commented 1 year ago

Thanks for the detailed info!

Hmm, generally if ErrorReporting.write is called inside of the function scope of the express route handlers, it should be attached correctly from what I see. I see a few options of what could be happening there:

  1. The error reporter is running outside of the route handler. In this case, we will have a hard time attaching this correctly and may instead attach tags/breadcrumbs to the global hub. Think of it like this:
const globalHub = getCurrentHub(); // <-- the global hub

app.get('/my-route', function() {
  const hub = getCurrentHub(); // <-- this is a forked hub accessible inside of this callback
});

app.get('my-other-route', function() {
  const hub = getCurrentHub(); // <-- this is another forked hub
});

ErrorReporter.write(error); // <-- this will attach stuff to the global hub

So if any event processing etc. happens outside of the route handler, there is no easy way to isolate what it does.

  1. Sentry & OTEL context/hubs mismatch. This may be happening as well as we do the forking in parallel with possibly different scenarios. If this is the case, we could maybe fix this by using a custom context manager/async strategy. I actually have an open PR: https://github.com/getsentry/sentry-javascript/pull/8797 tentatively/experimentally implementing something like this. You need:

a) A custom OpenTelemetry ContextManager - I extended the default async hooks one to make sure we also fork the hub correctly whenever we get a new OTEL context. We store the hub on the OTEL context so we can refer to it. b) A custom Sentry AsyncContextStrategy. I wrote one that simply defers to OTEL context for this, and fetch the hub from it we forked in a).

If you're brave, you could give this a try, but note that this is still very much experimental. Sadly, it is not super straightforward to keep context/hub/scope in sync 😬

cc @AbhiPrasad who may also have other ideas/clues about what is happening there.

getsantry[bot] commented 11 months ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

seeARMS commented 11 months ago

Re-opening this since it's still an issue. I've tried the latest @sentry/node (7.81.1) and the latest @sentry/node-experimental (same version) but still encountering the issues with leaking breadcrumbs & tags. This makes it pretty challenging to debug errors in our app, given that we can't trust the various user tags added to the event

lforst commented 11 months ago

@mydea when you're back

mydea commented 11 months ago

Re-opening this since it's still an issue. I've tried the latest @sentry/node (7.81.1) and the latest @sentry/node-experimental (same version) but still encountering the issues with leaking breadcrumbs & tags. This makes it pretty challenging to debug errors in our app, given that we can't trust the various user tags added to the event

Can you show me the setup you are using? FWIW when using node-experimental you do not (should not) also install/use @sentry/node in your code, you should only install @sentry/node-experimental and use the imports from there. Not sure if that is/may be related there, though.

seeARMS commented 10 months ago

I completely removed OTEL and reverted back to using the latest @sentry/node, and the issue with the erroneous breadcrumnbs is still occurring, so I think stuff may be getting written to global scope.

The setup is still pretty similar to the code I shared above:

My init_sentry:

import * as Sentry from "@sentry/node"

import { ProfilingIntegration } from "@sentry/profiling-node"

const SENTRY_IGNORED_ROUTES = [
  "/health",
  "/index.php",
  "/robots.txt",
  "/.env",
  "/favicon.ico",
  "/amp",
  // This is a very high-traffic route, so ignore it entirely on Sentry.
  /^\/crypto\/highlights\/contract\/[^/]+\/token\/[^/]+$/, // Regular expression for /crypto/highlights/contract/:address/token/:tokenId
]

const isProduction =
  process.env.FLY_APP_NAME === "paragraph-master" ||
  process.env.SENTRY_DEBUGGING

const TRACE_SAMPLE_PERCENTAGE = process.env.TRACE_SAMPLE_PERCENTAGE
  ? parseFloat(process.env.TRACE_SAMPLE_PERCENTAGE)
  : 0.04

function trimData(data: any) {
  const size = JSON.stringify(data || "").length

  // Arbitrarily cut off massive payloads. Sentry silently drops events that are too large,
  // so trim them.
  // See https://github.com/getsentry/sentry-javascript/issues/3036#issuecomment-1066491190
  if (size > 200000) {
    return data?.slice(0, 200000)
  }

  return data
}

function checkIfSentryIgnore(url?: string) {
  if (!url) return false

  for (const route of SENTRY_IGNORED_ROUTES) {
    if (typeof route === "string" && url.endsWith(route)) {
      return true
    } else if (route instanceof RegExp && route.test(url)) {
      return true
    }
  }
}

export default function initSentry(app: any) {
  Sentry.init({
    dsn: "snip,

    // Disable entirely on non-production environments.
    enabled: !!isProduction,
    debug: !!process.env.SENTRY_DEBUGGING,

    includeLocalVariables: true,

    // instrumenter: "otel",

    release: process.env.COMMIT,
    environment: process.env.FLY_APP_NAME || "development",

    // We need to use Sentry trace sampling, and NOT opentelemetry,
    // so that tags are properly set on the transaction.
    tracesSampler: (samplingContext) => {
      if (checkIfSentryIgnore(samplingContext?.request?.url)) return 0

      if (process.env.SENTRY_DEBUGGING) {
        return 1.0
      }

      return TRACE_SAMPLE_PERCENTAGE
    },

    profilesSampleRate: 1.0, // Profiling sample rate is relative to tracesSampleRate
    //

    beforeSend: (event) => {
      try {
        if (event.request?.data) {
          event.request.data = trimData(event.request.data)
        }
        if (event.message) {
          event.message = trimData(event.message)
        }

        return event
      } catch (e) {
        console.error("Error in Sentry beforeSend", e, event)
        return event
      }
    },

    integrations: [
      new Sentry.Integrations.OnUncaughtException({
        // We handle our own exceptions.
        exitEvenIfOtherHandlersAreRegistered: false,
      }),

      // @ts-ignore
      new ProfilingIntegration(),
      new Sentry.Integrations.Http({ tracing: true }),
      new Sentry.Integrations.Express({ app }),
    ],
  })
}

My app.ts:

  import initSentry from "./init_sentry"

  import express from "express"

  const app = express()

  initSentry(app)

  // This uses the Google Cloud Logging Bunyan library to install middleware.
  // Now, when I use `req.log.debug('blah') it logs to Google Cloud, but also to Sentry.
  const { logger: gcpLogger, mw } = await lb.express.middleware({
    logName: "api-" + process.env.FLY_APP_NAME,
    level: "debug",
  })

  // Log to Sentry as well.
  gcpLogger.addStream({
    type: "raw",
    stream: new SentryLogger(),

    closeOnExit: false,
    level: "debug",
  })

  app.use(mw)
  import indexRouter from "./routes/index"
  app.use("/", indexRouter)

indexRouter:


  // Example route

import { Router } from "express"
const router = Router()

   router.get("/", async function (req, res, next) {
       req.log.debug({sentryBreadcrumb: true},  "Sentry tags")
      return res.json({ success: true })
    })

The SentryLogger (ErrorReporting class):

import { BunyanLogRecord } from "@google-cloud/logging-bunyan/build/src/types/core"

import * as Sentry from "@sentry/node"

import { SeverityLevel } from "@sentry/node"

function mapLevelToSeverity(level: BunyanLogRecord["level"]): SeverityLevel {
  const numLevel = parseInt(level || "0", 10)

  switch (numLevel) {
    //case 10:
    //  return "trace"
    case 20:
      return "debug"
    case 30:
      return "info"
    case 40:
      return "warning"
    case 50:
      return "error"
    case 60:
      return "fatal"
    default:
      return "debug"
  }
}

function isString(value: any): value is string {
  return typeof value === "string"
}

function isPrimitive(value: any): value is string | number | boolean {
  return (
    isString(value) || typeof value === "number" || typeof value === "boolean"
  )
}

type FilteredObject = { [key: string]: string | number | boolean }

function filterObject(obj: any): FilteredObject | undefined {
  if (!obj || typeof obj !== "object") return

  const filteredObj: FilteredObject = {}

  Object.entries(obj).forEach(([key, value]) => {
    if (isString(key) && isPrimitive(value) && !IGNORED_LABELS.includes(key)) {
      filteredObj[key] = value
    }
  })

  return filteredObj
}

const IGNORED_LABELS = [
  "level",
  "msg",
  "time",
  "v",
  "name",
  "logging.googleapis.com/trace",
  "logging.googleapis.com/spanId",
  // These are already provided by Sentry and should not be overwritten.
  "environment",
  "release",
  "server_name",
  "user",
  "tags",
  "extra",
  "contexts",
  "request",
  "transaction",
  "modules",
  "breadcrumbs",
]

export const FLY_TAGS = {
  "fly.app.name": process.env.FLY_APP_NAME,
  "fly.alloc.id": process.env.FLY_ALLOC_ID,
  "fly.region": process.env.FLY_REGION,
  "fly.machine.id": process.env.FLY_MACHINE_ID,
  "fly.machine.version": process.env.FLY_MACHINE_VERSION,
  "fly.vm.memory.mb": process.env.FLY_VM_MEMORY_MB,
}

// Custom error reporting handler that passes
// data to Sentry on exception.
export default class ErrorReporting {
  public write(record: BunyanLogRecord) {
    const level = mapLevelToSeverity(record.level)

    Sentry.setTags(FLY_TAGS)

    const tags = filterObject(record)

    if (tags) {
      Sentry.setTags(tags)
    }

    // Everything except fatal, just log a breadcrumb
    if (!["error", "fatal"].includes(level) && record.msg) {
      Sentry.addBreadcrumb({
        level: mapLevelToSeverity(record.level),
        message: record.msg,

        // NO need to log the tags here, since we already log it
        // above.
        //data: tags,
      })
      return
    }

    console.log("Reporting Sentry error...")
    console.log(record)

    if (record.err) {
      Sentry.setExtra("error", record.err)
      if (record.err instanceof Error) {
        Sentry.captureException(record.err)
      }
    }

    if (record.msg) {
      Sentry.captureMessage(record.msg, "error")
    }

    Sentry.flush(10000)
  }
}
lforst commented 10 months ago

I feel like this is related to https://github.com/getsentry/sentry-javascript/issues/9874 and most likely has to do with the fact that when you create the logger stream (which is also creating the breadcrumbs) you are breaking out of the AsyncLocalStorage for the request. The context in which you run the ErrorReporting#write is not the same as the context you're handling the request in.

This is a bit unfortunate because it is likely not something we can fix directly - it is all the logging components and libraries that would have to make sure they preserve async context when log.info(), log.debug() etc. are called.

Just to make sure this is actually the case, if you call Sentry.addBreadcrumb() directly in the route handler, will it be correctly applied to the event?

seeARMS commented 9 months ago

I am not sure that is actually the case - I rewrote the code to completely bypass the logging library and use Sentry directly in a new middleware and it is still not working.

Example request with mixed-up breadcrumbs using this new code: https://paragraph.sentry.io/performance/node:bba0b8e75eb54d98897cbeb403ab5076/?environment=paragraph-master&project=6606495&query=http.method%3APOST&statsPeriod=5m&transaction=POST+%2Ftasks%2Fpublish

export function customLoggerMiddleware(
  req: Request,
  res: Response,
  next: NextFunction
) {
  const createCustomLogger = (baseLogger: any) => {
    const handleLogging = (level: SeverityLevels, args: any[]) => {
      Sentry.setTags(FLY_TAGS)

      if (args.length > 0) {
        if (typeof args[0] === "object" && !Array.isArray(args[0])) {
          Sentry.setTags(args[0])
        }

          // ... snip, some custom logic to handle breadcrumbData
          Sentry.addBreadcrumb(breadcrumbData)

          if (level === "error") {
            Sentry.captureException(new Error(args[messageIndex]))
          }
        }
      }
      baseLogger[level](...args)
    }

    return new Proxy(baseLogger, {
      get: (target, property) => {
        if (
          ["debug", "info", "error", "fatal", "warn"].includes(
            property as string
          )
        ) {
          return (...args: any[]) => {
            handleLogging(property as SeverityLevels, args)
          }
        } else if (property === "child") {
          if ("fields" in target) {
            setSentryTagsFromFields(target.fields)
          }
          // Ensure custom behavior is applied to child loggers
          return (options: any) => createCustomLogger(target.child(options))
        }

        return target[property]
      },
    })
  }

  // Apply custom logger only if it hasn't been applied before
  if (!req.log || !req.__isCustomLogger) {
    const originalLogger = req.log
    req.log = createCustomLogger(originalLogger)
    req.__isCustomLogger = true
  }

  next()
}

This is used in app.ts:

app.use(customLoggerMiddleware)
lforst commented 9 months ago

Can you share what other Parts of the SDK you used and in what order? You may need to wrap your middleware in Sentry.withIsolationScope(() => ...)).

seeARMS commented 9 months ago

That didn't seem to have fixed it. Here is the relevant code:

// init_sentry.ts
import * as Sentry from "@sentry/node"

import { ProfilingIntegration } from "@sentry/profiling-node"

export default function initSentry(app: any) {
  Sentry.init({
    dsn: "snip"

    // Disable entirely on non-production environments.
    enabled: !!isProduction,
    debug: !!process.env.SENTRY_DEBUGGING,

    includeLocalVariables: true,

    release: process.env.COMMIT,
    environment: process.env.FLY_APP_NAME || "development",

    tracesSampler: (samplingContext) => {
      if (checkIfSentryIgnore(samplingContext?.request?.url)) return 0

      if (process.env.SENTRY_DEBUGGING) {
        return 1.0
      }

      return TRACE_SAMPLE_PERCENTAGE
    },

    profilesSampleRate: 1.0, // Profiling sample rate is relative to tracesSampleRate

    integrations: [
      new Sentry.Integrations.OnUncaughtException({
        // We handle our own exceptions.
        exitEvenIfOtherHandlersAreRegistered: false,
      }),

      // @ts-ignore
      new ProfilingIntegration(),
      new Sentry.Integrations.Http({ tracing: true }),
      new Sentry.Integrations.Express({ app }),
    ],
  })
}
// app.ts

import f from "dotenv-flow"
f.config()

import initSentry from "./init_sentry"

import express from "express"

const app = express()

initSentry(app)

import * as Sentry from "@sentry/node"
app.use(Sentry.Handlers.requestHandler())
app.use(Sentry.Handlers.tracingHandler())

const { logger: gcpLogger, mw } = await lb.express.middleware({
  logName: "api-" + process.env.FLY_APP_NAME,
  level: "debug",
  maxEntrySize: 200000,
})

  app.use(mw)

  // This is the Sentry logger middleware above
  app.use(customLoggerMiddleware)

  // lots of other middleware and routes

  app.use(Sentry.Handlers.errorHandler())

const port = process.env.PORT || 8080

const server = app.listen(port, () => {
  logger.info(
    `HTTP server listening on port ${port} `
  )
})
lforst commented 9 months ago

Generally looks good. I think whatever the baseLogger is doing might lose the async context that isolates requests.

gajus commented 9 months ago

@seeARMS How are you getting samplingContext?.request?.url?

I am only getting:

{
  samplingContext: {
    transactionContext: { name: 'GET', parentSampled: undefined },
    parentSampled: undefined
  }
}

https://github.com/getsentry/sentry-javascript/discussions/10399#discussion-6149696

and looking at the implementation of @sentry/opentelemetry, thats the only thing they pass:

https://github.com/getsentry/sentry-javascript/blob/7dc92604e4a9448a17512d5622e28a3976b9267e/packages/opentelemetry/src/sampler.ts#L56C24-L62

mydea commented 9 months ago

@seeARMS How are you getting samplingContext?.request?.url?

I am only getting:

{
  samplingContext: {
    transactionContext: { name: 'GET', parentSampled: undefined },
    parentSampled: undefined
  }
}

#10399 (comment)

and looking at the implementation of @sentry/opentelemetry, thats the only thing they pass:

https://github.com/getsentry/sentry-javascript/blob/7dc92604e4a9448a17512d5622e28a3976b9267e/packages/opentelemetry/src/sampler.ts#L56C24-L62

We'll adjust this (still WIP) so that the tracesSampler receives the attributes of the span as sampling context - that would work for this case, wouldn't it?

gajus commented 9 months ago

We'll adjust this (still WIP) so that the tracesSampler receives the attributes of the span as sampling context - that would work for this case, wouldn't it?

Any pointers how to workaround this meanwhile?

mydea commented 9 months ago

I'll open a PR to add this for the next release!

mydea commented 9 months ago

See: https://github.com/getsentry/sentry-javascript/pull/10426

mydea commented 9 months ago

Moving this here, out of the PR - reply to https://github.com/getsentry/sentry-javascript/pull/10426#issuecomment-1917984279:

@mydea Related – how do I know inside of request if the request was sampled or not?

I need to generate and inject sentry-trace meta attribute.

<meta content="${sentryTrace}" name="sentry-trace" />

I currently do this:

import { context, trace } from '@opentelemetry/api';

const activeContext = context.active();

const activeSpan = trace.getSpan(activeContext);

if (!activeSpan) {
  return {
    sentryTrace: null,
    spanId: null,
    traceId: null,
  };
}

const { spanId, traceId } = activeSpan.spanContext();

const sentryTrace = `${traceId}-${spanId}-0`;

but need to figure out how to adjust -0 to -1 when the request is sampled.

const traceparentData = spanToTraceHeader(span);
const dynamicSamplingContext = dynamicSamplingContextToSentryBaggageHeader(
  getDynamicSamplingContextFromSpan(span),
);
const contentMeta = `<head>
    <meta name="sentry-trace" content="${traceparentData}"/>
    <meta name="baggage" content="${dynamicSamplingContext}"/>
`;

Should work?

gajus commented 9 months ago

Just to save others some time, spanToTraceHeader, dynamicSamplingContextToSentryBaggageHeader and getDynamicSamplingContextFromSpan come from @sentry/utils and @sentry/core.

@mydea The span accepted by these methods is not the same as what is returned by OpenTelemetry's getSpan.

gajus commented 9 months ago

This was useful:

/**
 * @see https://github.com/getsentry/sentry-javascript/blob/8bec42e0285ee301e8fc9bcaf02046daf48e0495/packages/core/src/utils/spanUtils.ts#L103
 */
export const spanIsSampled = (span: opentelemetry.api.Span) => {
  const { traceFlags } = span.spanContext();

  // eslint-disable-next-line no-bitwise
  return Boolean(traceFlags & 0x1);
};
mydea commented 9 months ago

Just to save others some time, spanToTraceHeader, dynamicSamplingContextToSentryBaggageHeader and getDynamicSamplingContextFromSpan come from @sentry/utils and @sentry/core.

@mydea The span accepted by these methods is not the same as what is returned by OpenTelemetry's getSpan.

Ah right, of course - this work is still WIP. In v8, spans will be the same everywhere - you will generally get OpenTelemetry Spans everywhere when interacting with spans. We hope to have a alpha version of this in a few weeks, where you should be able to have a fully integrated experience with the Sentry SDK & OpenTelemetry!

seeARMS commented 9 months ago

Generally looks good. I think whatever the baseLogger is doing might lose the async context that isolates requests.

Any other suggestions on what I can do here? I'm a bit at my wits end and have spent far too long trying to get this to work

lforst commented 9 months ago

@seeARMS Nope I am lost. Either we have a bug in the SDK or your logger is the offender. Can you reproduce this with a minimal setup and share it with us? That would be amazing!

AbhiPrasad commented 4 months ago

Things have changed a lot since we last update this - we now have official OTEL support with the SDK as the default which should make this experience much better.

Express with no custom OTEL setup: https://docs.sentry.io/platforms/javascript/guides/express/

custom OTEL setup (override sampler, processor, exporter): https://docs.sentry.io/platforms/javascript/guides/express/tracing/instrumentation/opentelemetry/#using-a-custom-opentelemetry-setup

Closing this issue as such.