getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.87k stars 1.55k forks source link

Spans in continueTrace callback are not being recorded #13146

Open kparksalku opened 1 month ago

kparksalku commented 1 month ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/aws-serverless

SDK Version

8.18.0

Framework Version

No response

Link to Sentry event

https://alku.sentry.io/performance/trace/d7e316d5f796b8b046d0c0cdbf5ca61c/?pageEnd&pageStart&project=4507617300054016&source=traces&statsPeriod=1h&timestamp=1722461032.632

Reproduction Example/SDK Setup

Initialization code for both Lambda functions:

  Sentry.init({
    debug: true,
    dsn: SENTRY_DSN,
    environment: "sandbox",
    integrations: [
      Sentry.postgresIntegration(),
      nodeProfilingIntegration()
    ],
    tracesSampleRate: 1.0,
    profilesSampleRate: 1.0,
  });

Lambda A:

export const handler = Sentry.wrapHandler(async (event) => {
  // Generate ClientContext headers

  // Invoke Lambda B with generated ClientContext

  // Return a response
});

Lambda B:

export const handler = async (event, context) => {
  const { baggage, sentryTrace } = extractFromLambdaClientContextHeaders(context);

  return await Sentry.continueTrace({ baggage, sentryTrace }, async () => {
    console.log(Sentry.getActiveSpan());
    const queryString = "select count(*) from myTable;";

    return await Sentry.startSpan({
      name: queryString,
      op: "db.sql.execute"
    }, async (span) => {
      console.log(span);
      return await executeQuery();
    });
  });
});

Steps to Reproduce

  1. Invoke Lambda A, which invokes Lambda B
  2. Check Traces, in Sentry.io

Expected Result

The trace should record the "db.sql.execute" span as a child span in the trace.

Actual Result

There is no record of the "db.sql.execute" span at all.

In the logs, I can see that continueTrace creates a nonRecordingSpan, which is not sent to Sentry. I believe this is the root of the problem. The call to startSpan appears to create a normal (recording) span, but since it is a child of the non-recording span, it is not sent along to Sentry either.

chargome commented 1 month ago

Hey @kparksalku, thanks for reaching out.

So ideally in v8 these spans should be automatically connected, or is there a specific reason why you use continueTrace?

Lms24 commented 1 month ago

Just to add to this: Have you tried using our Sentry.wrapHandler wrapper in Lambda function B?

Also, can you confirm that the headers are actually present when Lambda B is invoked?

kparksalku commented 1 month ago

Hey @kparksalku, thanks for reaching out.

So ideally in v8 these spans should be automatically connected, or is there a specific reason why you use continueTrace?

I'm not sure how it's possible that they would be automatically connected, since these are distributed systems. To answer your question, though, the reason I'm using continueTrace is that the documentation says to use it, and moreover, nothing else works.

kparksalku commented 1 month ago

Just to add to this: Have you tried using our Sentry.wrapHandler wrapper in Lambda function B?

Also, can you confirm that the headers are actually present when Lambda B is invoked?

Yes, I tried adding wrapHandler to Lambda B. When I do this, it creates what appears to be a normal (recording) span, but the trace ID is new; it's not the trace ID sent in the ClientContext. This raises another question as to why this never makes it to Sentry; I don't see this trace at all, and I would expect two separate traces show up, but they don't. Inside of this handler, if I keep continueTrace, then the active span inside this callback is a non-recording span, BUT the trace ID matches what's sent in the ClientContext. So it appears this span is getting associated correctly, but it's not being sent for some reason.

I believe the fact that the continueTrace span has the correct trace ID is proof that the headers are making it through to Lambda B, is it not?

Lms24 commented 1 month ago

I'm not sure how it's possible that they would be automatically connected, since these are distributed systems.

The idea is that it works automatically in the following situations:

Can you confirm that you run your lambda in ESM (using import syntax)?

Both should pick up our sentry-trace and baggage handlers correctly in the incoming request.

it's not the trace ID sent in the ClientContext

Hmm this is interesting and gave me an idea. Maybe there's a detail I'm not aware of for AWS lambda functions. Our handler extracts the tracing headers from the event object. You extract the headers from the context object. Is there a difference between these two? Perhaps our way is outdated by now and we didn't notice because we can't require the headers to be present 🤔

I believe the fact that the continueTrace span has the correct trace ID is proof that the headers are making it through to Lambda B, is it not?

Technically yes. I just didn't see this in the trace you linked (In fact, I'm not sure if this trace is the one from Lambda A or B or what should be in there vs. what isn't. I can only see that there's just one transaction, so it's not linked).

To debug and fix this we need to reproduce this properly. Can you show us how you invoke Lambda B from Lambda A? If possible it'd be great if you could provide a minimal but fully reproducible example. Otherwise we have to do this ourselves which likely ends up different from your version.

kparksalku commented 1 month ago

I'm not sure how it's possible that they would be automatically connected, since these are distributed systems.

The idea is that it works automatically in the following situations:

  • Lambda is run as CommonJS: OpenTelemetry instrumentation used by our SDK continues the trace
  • Lambda is run as ESM: Our wrapHandler function continues the trace

Can you confirm that you run your lambda in ESM (using import syntax)?

Both should pick up our sentry-trace and baggage handlers correctly in the incoming request.

it's not the trace ID sent in the ClientContext

Hmm this is interesting and gave me an idea. Maybe there's a detail I'm not aware of for AWS lambda functions. Our handler extracts the tracing headers from the event object. You extract the headers from the context object. Is there a difference between these two? Perhaps our way is outdated by now and we didn't notice because we can't require the headers to be present 🤔

I believe the fact that the continueTrace span has the correct trace ID is proof that the headers are making it through to Lambda B, is it not?

Technically yes. I just didn't see this in the trace you linked (In fact, I'm not sure if this trace is the one from Lambda A or B or what should be in there vs. what isn't. I can only see that there's just one transaction, so it's not linked).

To debug and fix this we need to reproduce this properly. Can you show us how you invoke Lambda B from Lambda A? If possible it'd be great if you could provide a minimal but fully reproducible example. Otherwise we have to do this ourselves which likely ends up different from your version.

Ah, yeah, the headers could be the problem. I've been digging through some of the code in this repo, and it appears the wrapHandler function is pulling headers from the event object. I'm not invoking the Lambda through an API gateway, which I believe is the assumption; I'm invoking it directly with the AWS SDK, so I can't pass custom headers along in the request. Here's the gist of how I'm invoking the Lambda:

    const client = new LambdaClient();
    const command = new InvokeCommand({
      FunctionName: `my-function`,
      InvocationType: "RequestResponse",
      ClientContext: generateLambdaClientContextHeaders(Sentry, Sentry.getActiveSpan()),
    });
    const invokeLambda = await client.send(command);

 function generateLambdaClientContextHeaders(Sentry, span) {
  return Buffer.from(
    JSON.stringify({
      baggage: Sentry.spanToBaggageHeader(span),
      "sentry-trace": Sentry.spanToTraceHeader(span),
    }),
  ).toString("base64");
}

So, if I manually pass the baggage and trace data through the ClientContext, I can retrieve it and (potentially) continue the trace, but I don't think the automatic instrumentation would support that. I'm still confused as to why continueTrace wouldn't work, though, as that seems to be the entire purpose of that function.

And yes, I'm using ESM.

Lms24 commented 1 month ago

I'm still confused as to why continueTrace wouldn't work, though, as that seems to be the entire purpose of that function.

My suspicion is that continueTrace works but the span isn't sent because the Lambda terminates before the SDK can send it. One of the reasons why we recommend using wrapHandler is because it tries to stall the lambda a bit to let the SDK flush and send events.

You can try to manually flush after continueTrace:

export const handler = async (event, context) => {
  const { baggage, sentryTrace } = extractFromLambdaClientContextHeaders(context);

   const result = await Sentry.continueTrace({ baggage, sentryTrace }, async () => {
    console.log(Sentry.getActiveSpan());
    const queryString = "select count(*) from myTable;";

    return await Sentry.startSpan({
      name: queryString,
      op: "db.sql.execute"
    }, async (span) => {
      console.log(span);
      return await executeQuery();
    });
  });
  await Sentry.flush(2000)
  return result;
});

or, another idea would be to actually use wrapHandler in Lambda B (to get all the flushing and timeout logic right) but deactivate automatic trace creation in favour of your continueTrace call:


const _handler = async (event, context) => {
  const { baggage, sentryTrace } = extractFromLambdaClientContextHeaders(context);

  return await Sentry.continueTrace({ baggage, sentryTrace }, async () => {
    console.log(Sentry.getActiveSpan());
    const queryString = "select count(*) from myTable;";

    return await Sentry.startSpan({
      name: queryString,
      op: "db.sql.execute"
    }, async (span) => {
      console.log(span);
      return await executeQuery();
    });
  });
});

export const handler = Sentry.wrapHandler(_handler, {startTrace: false})

Ideally, we find a way (and document this properly) how traces across AWS SDK invoked lambda functions can be continued.

Gonna cc @andreiborza for some input here as well.

kparksalku commented 1 month ago

My suspicion is that continueTrace works but the span isn't sent because the Lambda terminates before the SDK can send it. One of the reasons why we recommend using wrapHandler is because it tries to stall the lambda a bit to let the SDK flush and send events.

You can try to manually flush after continueTrace:

export const handler = async (event, context) => { const { baggage, sentryTrace } = extractFromLambdaClientContextHeaders(context);

const result = await Sentry.continueTrace({ baggage, sentryTrace }, async () => { console.log(Sentry.getActiveSpan()); const queryString = "select count(*) from myTable;";

return await Sentry.startSpan({
  name: queryString,
  op: "db.sql.execute"
}, async (span) => {
  console.log(span);
  return await executeQuery();
});

}); await Sentry.flush(2000) return result; }); or, another idea would be to actually use wrapHandler in Lambda B (to get all the flushing and timeout logic right) but deactivate automatic trace creation in favour of your continueTrace call:

const _handler = async (event, context) => { const { baggage, sentryTrace } = extractFromLambdaClientContextHeaders(context);

return await Sentry.continueTrace({ baggage, sentryTrace }, async () => { console.log(Sentry.getActiveSpan()); const queryString = "select count(*) from myTable;";

return await Sentry.startSpan({
  name: queryString,
  op: "db.sql.execute"
}, async (span) => {
  console.log(span);
  return await executeQuery();
});

}); });

export const handler = Sentry.wrapHandler(_handler, {startTrace: false}) Ideally, we find a way (and document this properly) how traces across AWS SDK invoked lambda functions can be continued.

Gonna cc @andreiborza for some input here as well.

I tried both of these solutions, and neither worked. I think the issue is less about the timings, and more that continueTrace returns a NonRecordingSpan, which, according to the documentation, is not sent to Sentry. I don't see any way to force it to return a normal span.

andreiborza commented 1 month ago

hi @kparksalku,

I just took a stab at reproducing this and found that you can pass the headers on the event via the Payload like so

    const command = new InvokeCommand({
        FunctionName: `my-function`,
        InvocationType: "RequestResponse",
        Payload: JSON.stringify({
            headers: {
                baggage: Sentry.spanToBaggageHeader(span),
                'sentry-trace': Sentry.spanToTraceHeader(span)
            }
        })
    });

That way our sdk picks them up correctly and you should have connected traces. Let me know if this doesn't work for you.

kparksalku commented 1 month ago

hi @kparksalku,

I just took a stab at reproducing this and found that you can pass the headers on the event via the Payload like so

    const command = new InvokeCommand({
        FunctionName: `my-function`,
        InvocationType: "RequestResponse",
        Payload: JSON.stringify({
            headers: {
                baggage: Sentry.spanToBaggageHeader(span),
                'sentry-trace': Sentry.spanToTraceHeader(span)
            }
        })
    });

That way our sdk picks them up correctly and you should have connected traces. Let me know if this doesn't work for you.

Hi @andreiborza, unfortunately, this doesn't work either. I don't believe this approach actually attaches the headers to the HTTP request; it just adds an arbitrary "headers" property to the payload object. I can retrieve these values by parsing the payload in the lambda, but the Sentry SDK doesn't pick up on it.

andreiborza commented 1 month ago

@kparksalku we pick the headers from the event (first argument to the handler) in the sdk, see https://github.com/getsentry/sentry-javascript/blob/develop/packages/aws-serverless/src/sdk.ts#L337-L343

Have you tried? I got a connected trace earlier: Image

kparksalku commented 1 month ago

@kparksalku we pick the headers from the event (first argument to the handler) in the sdk, see https://github.com/getsentry/sentry-javascript/blob/develop/packages/aws-serverless/src/sdk.ts#L337-L343

Have you tried? I got a connected trace earlier: Image

@andreiborza Yes, this is what I tried:

    const activeSpan = Sentry.getActiveSpan();

    const command = new InvokeCommand({
      FunctionName: `my-function`,
      InvocationType: "RequestResponse",
      LogType: "None",
      Payload: JSON.stringify({
        headers: {
          baggage: Sentry.spanToBaggageHeader(activeSpan),
          "sentry-trace": Sentry.spanToTraceHeader(activeSpan),
        },
        pathParameters: { ... },
        body: JSON.stringify({ ... }),
      }),
    });

It does not result in the span showing up in Sentry.

andreiborza commented 1 month ago

@kparksalku right, I think I see my mistake. I specifically start a span here. Is your activeSpan undefined by any chance? It's getting late here in Europe, I will take another stab at better reproducing this by chaining lambdas (I only invoke my single lambda locally via the SDK as described above) on Monday.

For the time being, I think if you wrap your InvokeCommand block in startSpan({ name: 'some-span-name' }, (span) => { ... }) you should be able to get the right trace date.

kparksalku commented 1 month ago

@kparksalku right, I think I see my mistake. I specifically start a span here. Is your activeSpan undefined by any chance? It's getting late here in Europe, I will take another stab at better reproducing this by chaining lambdas (I only invoke my single lambda locally via the SDK as described above) on Monday.

For the time being, I think if you wrap your InvokeCommand block in startSpan({ name: 'some-span-name' }, (span) => { ... }) you should be able to get the right trace date.

That activeSpan is defined, before invoking Lambda B, and I can see it in Sentry

andreiborza commented 1 month ago

Thanks for confirming, I'll take a closer look at reproducing this.

andreiborza commented 1 month ago

@kparksalku we had another look through our aws-sdk integration and found that the underlying otel aws-sdk integration propagates the context to the lambda context object. As you correctly identified, our SDK expects the tracing data to come from the event tho. We're going to fix this.

For the time being, I found a way to get connected traces this way:

  1. Are you following the docs in creating an instrument.mjs file to init Sentry? Are you specifying it in your NODE_OPTIONS env variable? Note the docs have a mistake, it should be NODE_OPTIONS = --import ./instrument.mjs. The ./ is necessary.
  2. You need to wrap the Payload in Buffer.from, e.g. Payload: Buffer.from(JSON.stringify({...}).toString('base64'))
  3. You can remove the continueTrace calls.

For reference, here are my two Lambda functions:

Lambda A:

import { LambdaClient, InvokeCommand } from "@aws-sdk/client-lambda";
import * as Sentry from "@sentry/aws-serverless";

export const handler = Sentry.wrapHandler(async (event, context) => {
  const span = Sentry.getActiveSpan()
  const headers = {
    baggage: Sentry.spanToBaggageHeader(span),
    'sentry-trace': Sentry.spanToTraceHeader(span)
  }
  const client = new LambdaClient();
  const command = new InvokeCommand({
    FunctionName: `lambda-B-13146`,
    InvocationType: "RequestResponse",
    Payload: Buffer.from(JSON.stringify({
      headers
    }).toString('base64'))
  })
  return client.send(command);
});

Lambda B:

import * as Sentry from "@sentry/aws-serverless";
Sentry.addIntegration(Sentry.postgresIntegration())

export const handler = Sentry.wrapHandler(async (event) => {
  const queryString = "select count(*) from myTable;";
  return await Sentry.startSpan({
    name: queryString,
    op: "db.sql.execute"
  }, async (span) => {
    console.log('executing query', queryString);
  })
})

This should give you a trace looking like this:

Image

Now, this isn't perfect. As you can see, the function.aws.lambda - lambda-B-13146 span should be a child of the http.client span of Lambda A, but at least they share the same trace root and are connected that way.

kparksalku commented 1 month ago

Unfortunately, this still doesn't work. I've tried loading the instrumentation file, both from the the NODE_OPTIONS environment variable, and by exporting the Sentry object from the instrument file, and then importing it in the Lambda. I'm actually having better luck by exporting the object, rather than using the environment variable.

Also, the solution provided doesn't work, as AWS doesn't expect the Payload to be base64 encoded. I think you're referring to ClientContext. Base64 encoding the Payload results in an error.

andreiborza commented 1 month ago

Could you please explain your whole setup in more detail? How are you getting your Sentry package, how are you initing Sentry, how are you executing your lambdas (from start to Lambda A to Lambda B). I feel like we're missing something here.

Additionally, please enable debug: true in the init and paste logs from both Lambdas.

For my setup above, I have Lambda A and B as pasted and I invoke Lambda A from the AWS UI with an empty event json. I also invoked Lambda A locally via the aws-sdk using a node script that uses InvokeCommand. The traces were connected both times.

kparksalku commented 1 month ago

Could you please explain your whole setup in more detail? How are you getting your Sentry package, how are you initing Sentry, how are you executing your lambdas (from start to Lambda A to Lambda B). I feel like we're missing something here.

Additionally, please enable debug: true in the init and paste logs from both Lambdas.

For my setup above, I have Lambda A and B as pasted and I invoke Lambda A from the AWS UI with an empty event json. I also invoked Lambda A locally via the aws-sdk using a node script that uses InvokeCommand. The traces were connected both times.

I've tried several different approaches, in an effort to get it to work, so I'll do my best to describe everything I've tried.

First, I'm getting this Sentry package from NPM: "@sentry/aws-serverless": "^8.18.0". I'm putting this in a Lambda Layer, along with my instrument file. I posted my config file above. I've tried a couple different approaches in the Lambda functions.

First, I used the NODE_OPTIONS approach with the Lambda functions to initialize the Sentry code in the Layer, and then I had to import the Sentry code in the Lambda as well. This confused me, because the Sentry object initialized in the Layer is not the same object that is imported from the Lambda function.

Second, I tried removing the NODE_OPTIONS setup command, and instead, I imported my layer code inside Lambda A. In the layer code, I exported the Sentry object after initializing it in the instrument file. So, this initializes Sentry first thing during the initialization of the Lambda function and it ensure that the Lambda function is using the same instance of the Sentry object as is created during initialization.

The second approach has given me more consistent results; I can see all of the traces from Lambda A in Sentry.io. Just to clarify, I can see the invocation of Lambda A as the root span, and I can see the subsequent HTTP/RPC call to invoke Lambda B, but both approaches still did not capture the traces from within Lambda B.

I've already enabled debug in my Sentry config (posted above), but I'm not seeing any additional logs. Where would I find that? I couldn't find that information in the documentation.

andreiborza commented 1 month ago

@kparksalku thanks. We just merged in a PR that hopefully fixes all of this for you.We're probably going to cut a release today or so.

As for your setup, after we release the new fix, I recommend following the four steps in https://docs.sentry.io/platforms/javascript/guides/aws-lambda/install/esm-npm with setting NODE_OPTIONS to --import @sentry/aws-serverless/awslambda-auto.

Then you need to make sure to wrap your handler with Sentry.wrapHandler

Here's what my Lambda A looks like:

import { LambdaClient, InvokeCommand } from "@aws-sdk/client-lambda";
import * as Sentry from "@sentry/aws-serverless";

export const handler = Sentry.wrapHandler(async (event, context) => {
  const client = new LambdaClient();
  const command = new InvokeCommand({
    FunctionName: `lambda-B-13146`,
    InvocationType: "RequestResponse",
    Payload: new Uint16Array(),
  })
  return client.send(command);
});

and here's what my Lambda B looks like:

import * as Sentry from "@sentry/aws-serverless";
Sentry.addIntegration(Sentry.postgresIntegration())

export const handler = Sentry.wrapHandler(async (event) => {
  const queryString = "select count(*) from myTable;";
  return await Sentry.startSpan({
    name: queryString,
    op: "db.sql.execute"
  }, async (span) => {
    console.log('executing query', queryString);
  })
})

Both Lambdas have a layer that just uses a node_modules folder that has all packages that npm install @sentry/aws-serverless produces.

That way you should be getting a trace like this: CleanShot 2024-08-07 at 16 34 51@2x

kparksalku commented 1 month ago

Thanks. I'll keep an eye out for the new release.

One concern though with this approach, using NODE_OPTIONS: The Sentry object in your Lambda code is not the same instance as is created during instrumentation in the layer, since you're importing the @sentry/aws-serverless package twice. Does that not cause problems for you?

andreiborza commented 1 month ago

@kparksalku that confusion is understandable, but basically our top-level apis (e.g. startSpan) always get the sentry carrier object from the global scope and the NODE_OPTIONS is the first Sentry instance that writes itself to the global scope. If you encounter any specific issues with that please feel free to file another issue.

andreiborza commented 1 month ago

@kparksalku this has bene released, please give it a try when you have some time.

kparksalku commented 1 month ago

@andreiborza, unfortunately, this still doesn't work. After upgrading to version 8.25.0, I instrumented my lambdas and layer the way you suggested. I tried using both @sentry/aws-serverless/awslambda-auto and @sentry/aws-serverless in NODE_OPTIONS. I tried importing @sentry/aws-serverless in both Lambdas, as well as the Layer. I now do not get ANY traces at all if I use the NODE_OPTIONS approach. I logged out the active spans for both Lambdas, and they are nonRecording spans. How do I get them to be normal (recording) spans?

I then tried exporting the Sentry object from the layer, and importing it in the Lambdas. That allows me to get the traces from Lambda A, but still nothing from Lambda B. I logged out the active spans for both Lambdas and (to my surprise), the active spans in Lambda B appear to be normal spans, and they have the correct trace IDs and parent span IDs, and they're no longer nonRecording spans. So, I think that's an improvement, but I don't know why they wouldn't be forwarded to Sentry.

I'm pretty much out of ideas of other things to try.

kparksalku commented 1 month ago

I just realized why I think it's not working for me. Lambda B is in a VPC, and doesn't have internet access. So, as I understand it, the Sentry SDK tries to send the traces back from where the spans are created. If this is true, then the Lambda would have no way to get the traces back to Sentry. That would also explain why I'm seeing regular (recording) spans that are not making it back to Sentry.

Are there any suggested ways to handle this situation, besides re-architecting our solution? Is there a way to send the trace data back to Lambda A in the response, instead of sending it back to Sentry directly?

mydea commented 1 month ago

Ahh, yeah, that makes sense then! I don't think we have a way to accomplish this as of today, sadly. Each function has it's own SDK and is responsible to send it's own data. As of today, using the Sentry SDK in an environment without internet access does not work, because even errors could not be sent to Sentry. Maybe @Lms24 or @andreiborza have some ideas around this, though.