getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.98k stars 1.57k forks source link

Hapi performance spans are off #11213

Open joshkel opened 7 months ago

joshkel commented 7 months ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node

SDK Version

7.107

Framework Version

No response

Link to Sentry event

No response

SDK Setup

  Sentry.init({
    dsn: process.env.SENTRY_DSN,
    tracesSampleRate: 0.2,
    integrations: excludeFalsy([
      new Sentry.Integrations.Anr({ captureStackTrace: true }),
      new Sentry.Integrations.Hapi({ server }),
      new Sentry.Integrations.Postgres(),
    ]),
  });

Steps to Reproduce

  1. Enable Sentry Hapi integration.
  2. Let our Hapi service handle some async requests (e.g., database queries).

Expected Result

Database queries are correctly correlated with their HTTP requests.

Actual Result

Requests may be mismatched.

Sentry's hapiTracingPlugin uses onPreHandler and onPostHandler events to try to narrow the scope of the Sentry span to just a single Hapi request. However, that doesn't ensure handling of async operations within the request handler itself.

I tried figuring out a solution to this. If I'm understanding Sentry's API correctly, it expects you to use withScope or withIsolationScope to wrap the request handler. However, I don't see how to make that work with hapiTracingPlugin's design of onPreHandler and onPostHandler events. If Sentry had some sort of startIsolationScope and endIsoluationScope method, then pre- and post- events could probably work. Otherwise, the only solution I can think of would be to more invasively patch / instrument a Hapi application to wrap the request handlers themselves. (I believe that's the approach used by OpenTelemetry's Hapi instrumentation.)

Test case: The following demo starts a Hapi service on a random port, with an asynchronous route handler, and fires off 4 requests to the service, separated by 100ms intervals. Each request should have its own timestamp and span ID, and none of the requests should include an HTTP status (because the requests haven't finished at the time they're logged), but they're all the same.

'use strict';

const Sentry = require('@sentry/node');
const Hapi = require('@hapi/hapi');
const { setTimeout } = require('node:timers/promises');

const init = async () => {
    const server = Hapi.server({
        host: 'localhost'
    });

    Sentry.init({
        // Enable Spotlight to allow running without a DSN.
        spotlight: true,

        integrations: [
            new Sentry.Integrations.Hapi({ server }),
        ]
    })

    server.route({
        method: 'GET',
        path: '/',
        handler: async () => {
            await setTimeout(2000);
            console.log(Sentry.getActiveSpan().toJSON());
            return 'Hello World!';
        }
    });

    await server.start();
    console.log('Server running on %s', server.info.uri);

    return server.info.uri;
};

async function main() {
  const url = await init();

  for (let i = 0; i < 4; i++) {
    fetch(url).then((res) => res.text()).then(console.log);
    await setTimeout(100);
  }
}

main();
onurtemizkan commented 7 months ago

Thanks for reporting this, and for the test case @joshkel, I'll have a look!

mydea commented 7 months ago

IMHO this is hopefully just fixed in v8, where we use the OTEL-powered hapi integration!

mydea commented 6 months ago

Hello, could you give this a try on v8.0.0-beta.3? This has the new OTEL-powered hapi integration, which hopefully improves on the span quality.

getsantry[bot] commented 5 months ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

Marsup commented 5 months ago

I've been also following v8 development as v7 had this issue. So far, when I tested v8.0.0, it had no breadcrumbs at all, and now testing again with v8.3.0, it has way too many breadcrumbs not pertaining to the query, so pretty similar to v7. But maybe I'm doing it wrong, so I'd love outside confirmation.

mydea commented 5 months ago

On v8.3, what breadcrumbs are you getting? Could you share the link to an example trace? Isolation should work there, we have tests for this too, but maybe there are some edge cases we are not handling properly - what does your setup look on v8?

Marsup commented 5 months ago

For my tests, I intentionally throw an error in a handler, and I'm getting http breadcrumbs that happened long before the request was even started, things that happened while the server was initializing. I'm not sure I can share a trace, we're just in a testing phase using self-hosted for now.

I have a pretty standard initialization:

  Sentry.init({
    dsn: DSN,
    environment: process.env.NODE_ENV,
    release: serviceVersion,
    tracesSampleRate: 1,
    initialScope: {
      ...,
    },
  });

  const server = await createServer();
  await Sentry.setupHapiErrorHandler(server);
  await server.start();
mydea commented 5 months ago

So it's normal to get breadcrumbs for stuff that happened before a request started. Not sure if this is what is happening here, but you have to think of the way we handle this as forking a tree - everything that was captured before we "forked" a process, will be inherited:

fetch('http://example.com/1');
fetch('http://example.com/2');

app.get('/my-route', function() {
  fetch('http://example.com/3')
  // ...
});

app.get('/my-route-2', function() {
  fetch('http://example.com/4')
  // ...
});

This example, when you call `/my-route', you'd get the breadcrumbs 1, 2 and 3 (but not 4, as this is in a separate isolated process). Does that make sense? Could that be what's happening here?

If you do not want this, you could do the following once your app is booted:

fetch('http://example.com/1');
fetch('http://example.com/2');
// other stuff happening before app is booted...

// Clear all data that was captured so far
// Starting with a "blank slate" from here on
Sentry.getIsolationScope().clear(); 

app.get(...)
getsantry[bot] commented 4 months ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

Marsup commented 4 months ago

I've given up on this to be honest, I'm getting error reports without any breadcrumbs, but it's better than having wrong spans. Maybe it's something better addressed in the OTEL stack.

mydea commented 4 months ago

I see! We'll continue to try to figure out what's happening there, I'll keep this issue open in todo until somebody finds some time to investigate this in more detail. Thank you either way!