getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
8.02k stars 1.59k forks source link

Lambda timeout warning inaccurate #9810

Open thchia opened 11 months ago

thchia commented 11 months ago

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/serverless

SDK Version

7.86.0

Framework Version

Node 16

Link to Sentry event

https://chocolate-pte-ltd.sentry.io/issues/4717944733/?alert_rule_id=14826786&alert_type=issue&environment=production&notification_uuid=2ae1d5f6-f15f-4b59-b1c6-33aee81f6f27&project=4506177582137344&referrer=slack

SDK Setup

Initialisation:

Sentry.AWSLambda.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.SENTRY_ENVIRONMENT,
});

Handler Instrumentation:

export const handler = Sentry.AWSLambda.wrapHandler(
  (event) => { ... },
  {
    timeoutWarningLimit: 48000 // 80% of the 1min timeout
  }
)

Steps to Reproduce

The lambda in the linked event has a timeout of 1 minute. We configure the timeout warning to occur at 80% (48 seconds). The lambda in the linked event ran for 20s and we received the timeout warning.

Expected Result

No warning at 20s runtime

Actual Result

Please see linked event

Lms24 commented 11 months ago

Hi @thchia thanks for writing in!

I'm not too familiar with AWS Lambda so I tried finding out how we calculate this. It seems we take this timeout from the context object passed into the lambda handler and we subtract the timeoutWarningLimit from it.

https://github.com/getsentry/sentry-javascript/blob/a063fbccd79cab77da627e45166e50ddcd96928c/packages/serverless/src/awslambda.ts#L282-L284

So taking your configuration and assuming the tryGetRemainingTimeInMillis returns something close to 60s (obv in milliseconds but let's stay in seconds for simplicity), we subtract 48s and hence, the warning should be captured after ~12s. This is a bit weird and possibly buggy on our end because I'd also have assumed that it'd capture the warning at 48s. Will confirm with the team and if we agree that this is a bug, fix it.

To test if this is indeed what's going wrong here, I'd recommend setting the limit to 12_000 instead and checking if the warning is now emitted after 48s.

thchia commented 11 months ago

Hi @Lms24 thanks for the update.

After looking at the code snippet, it indeed seems like the timeoutWarningLimit we specify is supposed to be the amount of time left before sending a warning, as opposed to the amount of time elapsed. If that is the case, then it's fine, though I can suggest the documentation make this clear.

Let me confirm and get back to you.

Lms24 commented 11 months ago

@thchia you beat me to the answer 😅 I discussed this with team members and we came to the conclusion that the intention behind the timeoutWarningLimit is indeed to subtract a certain "offset" from the remaining time returned from the context. This is used to increase the likelihood of the message being sent before the lambda times out and causes the execution freeze. You can configure it to the 12s though to send the timeout on 80% - this should work!

And yes, we need to point this out more clearly in the documentation - you're absolutely right. Will follow up soon.