bbc / sqs-consumer

Build Amazon Simple Queue Service (SQS) based applications without the boilerplate
https://bbc.github.io/sqs-consumer/
Other
1.74k stars 333 forks source link

[Bug]: `MaxListenersExceededWarning` when using newer AWS SDK version #500

Closed peter-asztalos-mf closed 2 months ago

peter-asztalos-mf commented 3 months ago

Describe the bug

Hi,

Recently we tried to upgrade our dependencies, as part of this we tried to upgrade @aws-sdk/client-sqs from v3.576.0 to v3.600.0. After the upgrade I noticed we are getting MaxListenersExceededWarning warnings.

After looking into it turns out that there were changes in one of the SDK dependencies and the issue can be triggered starting from version v3.582.0. The change https://github.com/smithy-lang/smithy-typescript/pull/1308 .

I believe the issue is due to reusing the same abort controller in each request, if I set a new AbortController after a successful poll the error goes away.

The full warning:

(node:27582) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
    at EventTarget.[kNewListener] (node:internal/event_target:571:17)
    at EventTarget.[kNewListener] (node:internal/abort_controller:240:24)
    at EventTarget.addEventListener (node:internal/event_target:684:23)
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/node-http-handler/dist-cjs/index.js:331:23
    at new Promise (<anonymous>)
    at _NodeHttpHandler.handle (<project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/node-http-handler/dist-cjs/index.js:248:12)
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/smithy-client/dist-cjs/index.js:192:35
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/middleware-serde/dist-cjs/index.js:33:30
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/core/dist-cjs/index.js:165:24
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
    at <project>/node_modules/@aws-sdk/middleware-sdk-sqs/dist-cjs/index.js:92:18
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:34:22
    at Consumer.receiveMessage (<project>/node_modules/sqs-consumer/dist/cjs/consumer.js:214:28)

Your minimal, reproducible example

https://codesandbox.io/p/sandbox/upbeat-jerry-k6pplg

Steps to reproduce

1 .Get the example from the Code Sandbox

  1. Start docker compose to run SQS mock service
  2. Start the example script: NODE_OPTIONS="--trace-warnings" DEBUG=sqs-consumer npm run start
  3. observe the warning

Otherwise:

  1. upgrade to newer a @aws-sdk/client-sqs version (>= 3.582.0)
  2. set the WaitTimeSeconds in the ReceiveMessageCommand to 1 second to trigger the warning faster
  3. observe the warning

Expected behavior

There shouldn't be warning.

How often does this bug happen?

Every time

Screenshots or Videos

No response

Platform

Package version

v10.3.0

AWS SDK version

>=3.582.0

Additional context

No response

nicholasgriffintn commented 3 months ago

Hey, thanks for raising an issue.

3.6 of the sdk would be a major version, we don't naturally support major version releases as they come out as they come with breaking changes like this.

It would be something that we'd have to look at for our next major/minor release ourselves.

I am rather busy at the moment though, so I would suggest staying on 3.5 for a little while longer.

We'd be happy to take a pr though if this is an issue for anyone.

peter-asztalos-mf commented 3 months ago

Hej, thanks for the quick response.

I can reproduce this issue just upgrading to 3.582.0 which if I understood correctly is part of 3.5, right? šŸ¤”

I can look into it and open a PR. My fix during the test was very simple: passing new AbortController per request. Do you see any issue with that?

nicholasgriffintn commented 3 months ago

Ahh sorry, I thought you said it was 3.6, the pr you linked also seems like a breaking change to me.

If I have time I might be able to look into it. In terms of a possible solution, that might be if, I've not looked too deep though.

erichkuba commented 3 months ago

I'm having the same issue and can confirm that this is not just a warning but continues to leak until the process is out of memory.

nicholasgriffintn commented 3 months ago

If I have spare time this week I'll take a look, unfortunately, until then, I'd have to ask that everyone bares with.

If anyone has the time themselves to look, that'd be super helpful, but if not, you'll have to wait on me. Unfortunately, it has come at a particularly busy time for the BBC.

erichkuba commented 3 months ago

I've possibly been able to resolve this by pinning the @smithy/node-http-handler package (which is used internally by @aws-sdk) to version 3.0.1

With this, I have been able to upgrade the @aws-sdk packages to version ^3.600.0. This is not a long-term solution but provides a bridging solution until the root cause is addressed.

So, in short, add the following to your dependencies in your package.json file:

    "@smithy/node-http-handler": "3.0.1",

Note: After doing this, you will need to delete your package-lock.json file, or npm will install the version pinned in the lock file, which may not be the same as the version you have pinned in your package.json file.

I've deployed this change into our systems and will monitor memory usage over the next few days, but the warning no longer presents.

I hope this is helpful.

erichkuba commented 3 months ago

After reviewing the impact of the above fix, it does appear that the leak has been resolved, so I'm happy to recommend it as a temporary solution until a more permanent fix is available.

SQS-Consumer-Memory-Leak-Fix

nicholasgriffintn commented 3 months ago

Awesome! So it is a change in the next version that causes the issue, that makes it easier to figure out what's going on, thanks!

(This one: https://github.com/smithy-lang/smithy-typescript/releases/tag/%40smithy/node-http-handler%403.1.0 which is the change mentioned at the top, spot on!)

nicholasgriffintn commented 3 months ago

On further research, this may be because the Aborts are not automatically being cleaned up, there's an old Node issue around this: https://github.com/nodejs/node/issues/46525

Possibly because smithy is adding event listeners: https://github.com/smithy-lang/smithy-typescript/pull/1308/files#diff-98216b22e6b39a1cd6f2ab013304d4964152b554c4ad8fee4182a961b42285d8R192 but not removing them?

Update: I've opened an issue around this here: https://github.com/smithy-lang/smithy-typescript/issues/1318

nicholasgriffintn commented 3 months ago

Update on this, I have published 11.0.0-canary.1 with the latest version of the aws-sdk to test this with that new version and I've made a new example that sets up Docker with localstack for a truer test: https://github.com/bbc/sqs-consumer-starter/tree/main/examples/docker

With this, I have replicated the issue: Screenshot by Dropbox Capture

Still trying to figure out the best way to resolve it though.

joseamt commented 3 months ago

@erichkuba Still I have the issue.

Node version : 18.20.2 "@aws-sdk/client-sqs": "^3.600.0", "@ssut/nestjs-sqs": "^2.1.0",

Screenshot 2024-06-21 113538

I tried with @smithy/node-http-handler package and still the issue persists.

nicholasgriffintn commented 3 months ago

Hey, I'm still waiting on a response from AWS on the issue I raised above.

Outside of that, the current suggestion is to do what was described here: https://github.com/bbc/sqs-consumer/issues/500#issuecomment-2185502164, or pin your AWS SDK version to before this change.

Outside of that, if anyone has any ideas for how it might be resolved in SQS Consumer, feel free to open a PR, we are in an election period at the moment, so I'm strapped for time.

Could we avoid adding "me too" comments though please? This just fills up the thread and makes it unreadable for other people coming in.

erichkuba commented 3 months ago

@joseamt Thanks for bringing this up. I could have been clearer in https://github.com/bbc/sqs-consumer/issues/500#issuecomment-2185502164, and have updated it accordingly.

You need to pin the @smithy/node-http-handler package in your package,json, AND delete your package-lock.json, or the version in the package-lock.json file will be used.

I hope that gets you across the line.

joseamt commented 3 months ago

@erichkuba Thanks for the update, it worked šŸ‘ šŸ‘ šŸ‘ !!!. Please add the comments here if you get any updates from AWS. Once again thanks for the great help

cheers šŸ‘ šŸ’Æ

nicholasgriffintn commented 3 months ago

A PR has been raised by another member of the BBC that may fix this: https://github.com/smithy-lang/smithy-typescript/pull/1331

Still waiting on a response from AWS on the issue, but hopefully they'll review this as a fix.

paulbrimicombe commented 2 months ago

I had a little time to mess around with this today. I ran the docker example @nicholasgriffintn linked above (but not with tsx ā€” I compiled the TypeScript with tsc and then ran the JS code). The consumer options are:

const options: ConsumerOptions = {
  queueUrl: "http://127.0.0.1:4566/000000000000/sqs-consumer-test",
  sqs: new SQSClient({
    endpoint: "http://127.0.0.1:4566/000000000000/sqs-consumer-test",
    // This is the latest version of @smithy/node-http-handler
    requestHandler: new NodeHttpHandler({
      // Disable keepAlive in order to keep as little state around in memory as possible
      httpsAgent: new https.Agent({ keepAlive: false }),
      httpAgent: new http.Agent({ keepAlive: false }),
    }),
  }),
  // Artificially fast in order to test the memory pressure over time
  pollingWaitTimeMs: 5,
  waitTimeSeconds: 0.01,
  postReceiveMessageCallback: async () => {
    requests += 1;
    if (requests % 20 === 0) {
      // Print the memory usage
      console.log(`${requests}\t\t${process.memoryUsage().heapUsed}`);
    }
  },
  handleMessage: async () => {},
};

Then I run the scripts with:

tsc && \
  # Artificially limit the memory available to ensure that the Garbage Collector kicks in regularly \
  NODE_OPTIONS='--max-old-space-size=14' \
  AWS_REGION=eu-west-1 AWS_ACCESS_KEY_ID=123 AWS_SECRET_ACCESS_KEY=456 \
  node ./index.js > some-data-file.tsv

Results

Here are some results for:

image

Now the interesting bit....

These results are for:

image

Discussion / Conclusions

In the graph above, it's clear that the best memory performance comes when not using any AbortSignal ā€” this is not at all surprising. While it might be tempting to try to infer some differences between the performance of the other options, I think that any differences are due to different Garbage Collection processing and that if I re-ran the tests, any differences would be within the measurement error.

From these measurements, I can say that we could make changes to sqs-consumer to fix this issue without any changes to the @smithy/node-http-handler library:

In the meantime I'll raise another PR for the @smithy/node-http-handler library that removes the event listeners from the AbortSignal when the request has completed (this is slightly more tricky for the fetch handler than the http and https handlers).

paulbrimicombe commented 2 months ago

As promised: https://github.com/smithy-lang/smithy-typescript/pull/1332

0xMALVEE commented 2 months ago

@paulbrimicombe is this error fixed?

nicholasgriffintn commented 2 months ago

I've just ran our Docker setup for 5 minutes (https://github.com/bbc/sqs-consumer-starter/tree/main/examples/docker), and got no errors.

This is with 11.0.0-canary.6, but this is just a version to indicate our support of the new AWS SDK major version, on your side, you'll just need to make sure your package lock lists 3.1.2 or above for the package @smithy/node-http-handler.

If we don't see any further issues, we'll release v11 to the latest tag to indicate our support.

Thanks @paulbrimicombe btw!

nicholasgriffintn commented 2 months ago

Hey, we've now released v11.0.0 of the package (https://github.com/bbc/sqs-consumer/releases/tag/v11.0.0).

I'm going to close this issue as it looks like the update the node-http-handler fixed the issue, please feel free to comment if the issue crops up again, if we receive no further comments, this issue will be auto locked in 30 days.

joseamt commented 2 months ago

@nicholasgriffintn @paulbrimicombe @erichkuba Thanks for the update and support!! šŸ‘ šŸ’Æ

github-actions[bot] commented 1 month ago

This issue has been closed for more than 30 days. If this issue is still occuring, please open a new issue with more recent context.