getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.9k stars 1.56k forks source link

`continueTrace` with `startInactiveSpan` does not work correctly with Sentry v8 #13126

Closed mikan3rd closed 3 weeks ago

mikan3rd commented 2 months ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node

SDK Version

8.20.0

Framework Version

No response

Link to Sentry event

https://bizforward.sentry.io/issues/5659614696/events/e33fe93c614b41779919285369087372/

Reproduction Example/SDK Setup

No response

Steps to Reproduce

In Sentry v7, continueTrace is used as follows, and checking the Sentry Issue confirms that the traceId used in the frontend is successfully passed on to the backend.

  const server = new ApolloServer({
    context: async ({ req }): Promise<Context> => {
      const sentryTraceHeaders = req.headers['sentry-trace'];
      const sentryTrace = Array.isArray(sentryTraceHeaders) ? sentryTraceHeaders.join(',') : sentryTraceHeaders;
      const baggage = req.headers['baggage'];
      const sentrySpan = Sentry.continueTrace({ sentryTrace, baggage }, (transactionContext) => {
        return Sentry.startInactiveSpan({
          ...transactionContext,
          op: 'gql',
          name: req.body.operationName,
        });
      });
      return { sentrySpan };
    },
  );

In Sentry v8, the transactionContext argument of the continueTrace callback has been removed, so I changed the code as follows. However, when I did so, the traceId displayed in Sentry's Issue was different for frontend and backend.

  const server = new ApolloServer({
    context: async ({ req }): Promise<Context> => {
      const sentryTraceHeaders = req.headers['sentry-trace'];
      const sentryTrace = Array.isArray(sentryTraceHeaders) ? sentryTraceHeaders.join(',') : sentryTraceHeaders;
      const baggage = req.headers['baggage'];
      const sentrySpan = Sentry.continueTrace({ sentryTrace, baggage }, () => {
        return Sentry.startInactiveSpan({
          op: 'gql',
          name: req.body.operationName,
        });
      });
      return { sentrySpan };
    },
  );

I would like to know the cause and countermeasures.

Do I have to use startSpan, and if I use ApolloServer, how do I use startSpan?

Expected Result

The trace ID in the Issue must match the sentryTrace in the request header.

Actual Result

The trace ID listed in the Issue does not match the sentryTrace in the request header.

mydea commented 2 months ago

Hey,

do you actually need to manually continue the trace at all in v8? I would have thought that this should be automatically continued anyhow?

Looking at the issue you linked, I do see the span you started there connected to the frontend - If you click "View full trace" at the top of the issue, the trace contains a http.client span which has a child gql span. Is this not what you would expect? This seems correct to me! Or what would you expect differently?

Generally speaking, could you share your Sentry.init() code, and possibly the output when you configure debug: true? I think GraphQL/Apollo should be auto instrumented, you shouldn't really have to do anything yourself to get spans for this, I believe! Maybe there is a deeper setup issue there 🤔

mikan3rd commented 2 months ago

Sorry, I didn't provide enough links to refer to. See the following frontend and backend Issue. "View full trace" shows that the trace ID allocated on the frontend side is not carried over on the backend side.

mikan3rd commented 2 months ago

The following is the log

Log ``` Sentry Logger [debug]: @opentelemetry/instrumentation-http Applying instrumentation patch for nodejs core module on require hook { module: 'http' } Sentry Logger [debug]: @opentelemetry/instrumentation-http Applying instrumentation patch for nodejs core module on require hook { module: 'https' } Sentry Logger [debug]: Recording is off, propagating context in a non-recording span Sentry Logger [debug]: @opentelemetry/instrumentation-http https instrumentation outgoingRequest Sentry Logger [debug]: @opentelemetry/instrumentation-http http.ClientRequest return request Sentry Logger [debug]: @opentelemetry/instrumentation-http outgoingRequest on response() Sentry Logger [debug]: @opentelemetry/instrumentation-http outgoingRequest on end() Sentry Logger [debug]: @opentelemetry/instrumentation-http outgoingRequest on request close() query: SELECT `User`.`id` AS `User_id`, `User`.`uid` AS `User_uid`, `User`.`email` AS `User_email`, `User`.`company_id` AS `User_company_id`, `User`.`created_at` AS `User_created_at`, `User`.`updated_at` AS `User_updated_at` FROM `users` `User` WHERE ((`User`.`uid` = ?)) LIMIT 1 -- PARAMETERS: ["CDx21CRc4Ub5QxqY6gOJjU7lqmU2"] { sentryTrace: '15cfb66b7f894742af09b2338291fd77-90da34b18370fa7c-1', baggage: 'XXX' } Sentry Logger [log]: [Tracing] Inheriting remote parent's sampled decision for UserDashboardTemplateLatestLead: true Sentry Logger [log]: [Tracing] Starting sampled root span op: gql name: UserDashboardTemplateLatestLead ID: c94eaf3187b58549 parent ID: 90da34b18370fa7c { sentrySpan: Span { attributes: { 'sentry.op': 'gql', 'sentry.sample_rate': 1, 'sentry.parentIsRemote': true }, links: [], events: [], _droppedAttributesCount: 0, _droppedEventsCount: 0, _droppedLinksCount: 0, status: { code: 0 }, endTime: [ 0, 0 ], _ended: false, _duration: [ -1, -1 ], name: 'UserDashboardTemplateLatestLead', _spanContext: { traceId: '15cfb66b7f894742af09b2338291fd77', spanId: 'c94eaf3187b58549', traceFlags: 1, traceState: [TraceState] }, parentSpanId: '90da34b18370fa7c', kind: 0, _performanceStartTime: 126573.554792, _performanceOffset: 0.185302734375, _startTimeProvided: false, startTime: [ 1722487020, 413000000 ], resource: Resource { _attributes: [Object], asyncAttributesPending: false, _syncAttributes: [Object], _asyncAttributesPromise: undefined }, instrumentationLibrary: { name: '@sentry/node', version: '8.20.0', schemaUrl: undefined }, _spanLimits: { attributeValueLengthLimit: Infinity, attributeCountLimit: 128, linkCountLimit: 128, eventCountLimit: 128, attributePerEventCountLimit: 128, attributePerLinkCountLimit: 128 }, _attributeValueLengthLimit: Infinity, _spanProcessor: MultiSpanProcessor { _spanProcessors: [Array] } } } [2024-08-01T04:37:00.416Z] INFO: 44273 on mf-3145-mm01: (userId=1) graphql: { "operationName": "UserDashboardTemplateLatestLead", "query": "query UserDashboardTemplateLatestLead {\n latestLead {\n leadSource\n ...UserDetailTable\n ...ApplicationDetailTable\n ...NextLeadRegistration\n ...BranchModalElements\n __typename\n }\n}\n\nfragment UserDetailTable on Lead {\n id\n lastName\n firstName\n company {\n id\n name\n postalCode\n prefecture\n address1\n address2\n __typename\n }\n __typename\n}\n\nfragment ApplicationDetailTable on Lead {\n id\n createdAt\n amount\n preferredDateClass\n preferredDate\n purpose\n __typename\n}\n\nfragment NextLeadRegistration on Lead {\n sessionId\n leadSource\n __typename\n}\n\nfragment BranchModalElements on Lead {\n amount\n preferredDate\n preferredDateClass\n createdAt\n __typename\n}", "variables": {} } [2024-08-01T04:37:00.466Z] ERROR: 44273 on mf-3145-mm01: (userId=1) errors: [ { "name": "Error", "message": "Sentry Test2!!!", "stack": "Error: Sentry Test2!!!\n at Object. (/Users/dev/src/app/resolvers/lead.ts:146:15)\n at Generator.next ()\n at /Users/dev/src/app/resolvers/lead.ts:8:71\n at new Promise ()\n at __awaiter (/Users/dev/src/app/resolvers/lead.ts:4:12)\n at Object.latestLead (/Users/dev/src/app/resolvers/lead.ts:146:24)\n at /Users/dev/node_modules/.pnpm/graphql-middleware@6.1.35_graphql@16.9.0/node_modules/graphql-middleware/dist/applicator.js:9:112\n at middleware (/Users/dev/node_modules/.pnpm/graphql-shield@7.6.5_graphql-middleware@6.1.35_graphql@16.9.0__graphql@16.9.0/node_modules/graphql-shield/cjs/generator.js:30:30)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)" } ] [2024-08-01T04:37:00.467Z] ERROR: 44273 on mf-3145-mm01: Sentry Test2!!! formattedError: { "message": "Sentry Test2!!!", "locations": [ { "line": 2, "column": 3 } ], "path": [ "latestLead" ], "extensions": { "code": "INTERNAL_SERVER_ERROR", "stacktrace": [ "Error: Sentry Test2!!!", " at Object. (/Users/dev/src/app/resolvers/lead.ts:146:15)", " at Generator.next ()", " at /Users/dev/src/app/resolvers/lead.ts:8:71", " at new Promise ()", " at __awaiter (/Users/dev/src/app/resolvers/lead.ts:4:12)", " at Object.latestLead (/Users/dev/src/app/resolvers/lead.ts:146:24)", " at /Users/dev/node_modules/.pnpm/graphql-middleware@6.1.35_graphql@16.9.0/node_modules/graphql-middleware/dist/applicator.js:9:112", " at middleware (/Users/dev/node_modules/.pnpm/graphql-shield@7.6.5_graphql-middleware@6.1.35_graphql@16.9.0__graphql@16.9.0/node_modules/graphql-shield/cjs/generator.js:30:30)", " at processTicksAndRejections (node:internal/process/task_queues:95:5)" ] } } -- err: { "message": "Sentry Test2!!!", "locations": [ { "line": 2, "column": 3 } ], "path": [ "latestLead" ] } Sentry Logger [log]: [Tracing] Finishing "gql" root span "UserDashboardTemplateLatestLead" with ID c94eaf3187b58549 Sentry Logger [log]: SpanExporter exported 1 spans, 0 unsent spans remaining ```
Lms24 commented 2 months ago

Can you show us your Sentry.init call in your Apollo server? There are some suspicious logs that might hint some config problem:

Recording is off, propagating context in a non-recording span

I wonder why this is printed at all, given we're starting spans afterwards

Finishing "gql" root span "UserDashboardTemplateLatestLead" with ID c94eaf3187b58549

So this suggests to me that a span was actually started and sent but it's not connected to the error that was thrown in between.

Also, would you mind letting us know why you need to call continueTrace at all? This shouldn't be necessary if you initialize the SDK correctly.

mikan3rd commented 2 months ago

The following is Sentry.init.

Code ```ts Sentry.init({ dsn: process.env.SENTRY_DSN, environment: process.env.APP_ENV, enabled: true, debug: true, release: process.env.SENTRY_RELEASE, serverName: 'app', sampleRate: 1.0, tracesSampleRate: 1.0, normalizeDepth: 10, beforeSend(event, hint) { event.fingerprint = [ '{{ default }}', event.exception?.values?.[0]?.type ?? '', event.exception?.values?.[0]?.value ?? '', ]; return event; }, }); const server = new ApolloServer({ context: async ({ req }): Promise => { const sentryTraceHeaders = req.headers['sentry-trace']; const sentryTrace = Array.isArray(sentryTraceHeaders) ? sentryTraceHeaders.join(',') : sentryTraceHeaders; const baggage = req.headers['baggage']; const sentrySpan = Sentry.continueTrace({ sentryTrace, baggage }, () => { return Sentry.startInactiveSpan({ op: 'gql', name: req.body.operationName, }); }); return { sentrySpan }; }, plugins: [ ApolloServerSentryPlugin, ], ); const ApolloServerSentryPlugin: ApolloServerPlugin = { requestDidStart: async () => ({ didEncounterErrors: async ({ contextValue, errors, request }) => { errors.forEach((error) => { reportError(error, { cloudTraceId: contextValue.cloudTraceId, user: contextValue.user, request }); }); }, willSendResponse: async ({ contextValue }) => { contextValue.sentrySpan?.end(); }, }), }; export const reportError = ( err: Error | GraphQLError, { cloudTraceId, request, user }: { cloudTraceId?: string; request?: GraphQLRequest; user?: Context['user'] }, ) => { Sentry.withScope((scope) => { scope.setTag('cloud_trace_id', cloudTraceId); scope.setExtra('error', err); if (request) { const { operationName, query, variables } = request; scope.setTag('operationName', operationName); scope.setContext('Apollo Error', { operationName, query, variables, }); } if (err instanceof GraphQLError && err.path) { scope.addBreadcrumb({ category: 'query-path', message: err.path.join(' > '), level: 'debug', }); } if (user) { scope.setUser({ id: user.userId?.toString(), uid: user.uid, email: user.email, }); } Sentry.captureException(err); }); }; ```
chargome commented 2 months ago

@mikan3rd Did you try running your Apollo server without manually continuing the trace, or is there a specific reason for the code inside your context function?

mikan3rd commented 2 months ago

I removed the code and the code about continueTrace, but the frontend and backend trace IDs remained different.

Log ``` Sentry Logger [debug]: @opentelemetry/instrumentation-http Applying instrumentation patch for nodejs core module on require hook { module: 'http' } Sentry Logger [debug]: @opentelemetry/instrumentation-http Applying instrumentation patch for nodejs core module on require hook { module: 'https' } Sentry Logger [debug]: Recording is off, propagating context in a non-recording span Sentry Logger [debug]: @opentelemetry/instrumentation-http https instrumentation outgoingRequest Sentry Logger [debug]: @opentelemetry/instrumentation-http http.ClientRequest return request Sentry Logger [debug]: @opentelemetry/instrumentation-http outgoingRequest on response() Sentry Logger [debug]: @opentelemetry/instrumentation-http outgoingRequest on end() Sentry Logger [debug]: @opentelemetry/instrumentation-http outgoingRequest on request close() query: SELECT `User`.`id` AS `User_id`, `User`.`uid` AS `User_uid`, `User`.`email` AS `User_email`, `User`.`company_id` AS `User_company_id`, `User`.`created_at` AS `User_created_at`, `User`.`updated_at` AS `User_updated_at` FROM `users` `User` WHERE ((`User`.`uid` = ?)) LIMIT 1 -- PARAMETERS: ["9E9EqEuw59hzY9dT4N96ZSFpxk53"] [2024-08-02T02:30:34.889Z] INFO: 14387 on mf-3145-mm01: (userId=27) graphql: { "operationName": "UserDashboardTemplateLatestLead", "query": "query UserDashboardTemplateLatestLead {\n latestLead {\n leadSource\n ...UserDetailTable\n ...ApplicationDetailTable\n ...NextLeadRegistration\n ...BranchModalElements\n __typename\n }\n}\n\nfragment UserDetailTable on Lead {\n id\n lastName\n firstName\n company {\n id\n name\n postalCode\n prefecture\n address1\n address2\n __typename\n }\n __typename\n}\n\nfragment ApplicationDetailTable on Lead {\n id\n createdAt\n amount\n preferredDateClass\n preferredDate\n purpose\n __typename\n}\n\nfragment NextLeadRegistration on Lead {\n sessionId\n leadSource\n __typename\n}\n\nfragment BranchModalElements on Lead {\n amount\n preferredDate\n preferredDateClass\n createdAt\n __typename\n}", "variables": {} } [2024-08-02T02:30:34.923Z] ERROR: 14387 on mf-3145-mm01: (userId=27) errors: [ { "name": "Error", "message": "SentryTest3", "stack": "Error: SentryTest3\n at Object. (/Users/dev/src/app/resolvers/lead.ts:146:15)\n at Generator.next ()\n at /Users/dev/src/app/resolvers/lead.ts:8:71\n at new Promise ()\n at __awaiter (/Users/dev/src/app/resolvers/lead.ts:4:12)\n at Object.latestLead (/Users/dev/src/app/resolvers/lead.ts:146:24)\n at /Users/dev/node_modules/.pnpm/graphql-middleware@6.1.35_graphql@16.9.0/node_modules/graphql-middleware/dist/applicator.js:9:112\n at middleware (/Users/dev/node_modules/.pnpm/graphql-shield@7.6.5_graphql-middleware@6.1.35_graphql@16.9.0__graphql@16.9.0/node_modules/graphql-shield/cjs/generator.js:30:30)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)" } ] [2024-08-02T02:30:34.923Z] ERROR: 14387 on mf-3145-mm01: SentryTest3 formattedError: { "message": "SentryTest3", "locations": [ { "line": 2, "column": 3 } ], "path": [ "latestLead" ], "extensions": { "code": "INTERNAL_SERVER_ERROR", "stacktrace": [ "Error: SentryTest3", " at Object. (/Users/dev/src/app/resolvers/lead.ts:146:15)", " at Generator.next ()", " at /Users/dev/src/app/resolvers/lead.ts:8:71", " at new Promise ()", " at __awaiter (/Users/dev/src/app/resolvers/lead.ts:4:12)", " at Object.latestLead (/Users/dev/src/app/resolvers/lead.ts:146:24)", " at /Users/dev/node_modules/.pnpm/graphql-middleware@6.1.35_graphql@16.9.0/node_modules/graphql-middleware/dist/applicator.js:9:112", " at middleware (/Users/dev/node_modules/.pnpm/graphql-shield@7.6.5_graphql-middleware@6.1.35_graphql@16.9.0__graphql@16.9.0/node_modules/graphql-shield/cjs/generator.js:30:30)", " at processTicksAndRejections (node:internal/process/task_queues:95:5)" ] } } -- err: { "message": "SentryTest3", "locations": [ { "line": 2, "column": 3 } ], "path": [ "latestLead" ] } ```
mydea commented 2 months ago

Could you paste the logs from application startup? The ones that show e.g. "Integration added...." plus everything until the app has settled/fully started up?

mikan3rd commented 2 months ago

The following is the logs from application startup.

Logs ``` [INFO] 11:06:42 ts-node-dev ver. 2.0.0 (using ts-node ver. 10.9.2, typescript ver. 5.5.4) Debugger listening on ws://127.0.0.1:9229/bbbfeaeb-b7ea-48d8-81fe-f15da2092a96 For help, see: https://nodejs.org/en/docs/inspector Sentry Logger [log]: Initializing Sentry: process: 77113, thread: main. Sentry Logger [log]: Integration installed: InboundFilters Sentry Logger [log]: Integration installed: FunctionToString Sentry Logger [log]: Integration installed: LinkedErrors Sentry Logger [log]: Integration installed: RequestData Sentry Logger [log]: Integration installed: Console Sentry Logger [log]: Integration installed: Http Sentry Logger [log]: Integration installed: NodeFetch Sentry Logger [log]: Integration installed: OnUncaughtException Sentry Logger [log]: Integration installed: OnUnhandledRejection Sentry Logger [log]: Integration installed: ContextLines Sentry Logger [log]: Integration installed: LocalVariablesAsync Sentry Logger [log]: Integration installed: Context Sentry Logger [log]: Integration installed: Modules Sentry Logger [log]: Integration installed: Express Sentry Logger [log]: Integration installed: Fastify Sentry Logger [log]: Integration installed: Graphql Sentry Logger [log]: Integration installed: Mongo Sentry Logger [log]: Integration installed: Mongoose Sentry Logger [log]: Integration installed: Mysql Sentry Logger [log]: Integration installed: Mysql2 Sentry Logger [log]: Integration installed: Redis Sentry Logger [log]: Integration installed: Postgres Sentry Logger [log]: Integration installed: Nest Sentry Logger [log]: Integration installed: Hapi Sentry Logger [log]: Integration installed: Koa Sentry Logger [log]: Integration installed: Connect Sentry Logger [log]: Running in CommonJS mode. Sentry Logger [debug]: @opentelemetry/api: Registered a global for diag v1.9.0. Sentry Logger [debug]: @opentelemetry/api: Registered a global for trace v1.9.0. Sentry Logger [debug]: @opentelemetry/api: Registered a global for context v1.9.0. Sentry Logger [debug]: @opentelemetry/api: Registered a global for propagation v1.9.0. query: SELECT VERSION() AS `version` [2024-08-05T02:07:18.579Z] INFO: 77113 on mf-3145-mm01: 🚀 Server ready at http://localhost:4000/ ```
Lms24 commented 2 months ago

@mikan3rd looking at your Sentry.init code above and given that I don't see any logs about opentelemetry wrapping graphql (or related libraries): Can you confirm that you call Sentry.init before you require any other module? This is crucial to work for all our instrumentation, except for http(s). More information here.

mikan3rd commented 1 month ago

Thanks for the reply. I have managed to get trace to work by putting Sentry.init before all imports. However, I am concerned that I am getting the following error after starting the server. What is the problem?

Sentry Logger [debug]: @opentelemetry/instrumentation-http outgoingRequest on request error() Error: socket hang up
    at Socket.socketCloseListener (node:_http_client:473:25)
    at Socket.emit (node:events:531:35)
    at TCP.<anonymous> (node:net:339:12)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: ‘ECONNRESET’
}
andreiborza commented 1 month ago

Hello @mikan3rd,

We are currently on company-wide hackweek and thus on limited support. We'll take a look at this next week.

lforst commented 1 month ago

@mikan3rd I believe this error can just be ignored. It may be something that your application swallowed up until now but since the SDK hooks in at quite a low level it surfaces it through a debug log. We don't have reason to believe the SDK causes a socket hangup.

mikan3rd commented 3 weeks ago

My problem seems to be solved, thank you very much!