Closed steelbrain closed 1 year ago
I got the same issue, "@aws-sdk/client-secrets-manager": "^3.53.0". And I got the same behavior: "Most of the times, everything works, but then unexpectedly crashes". And it crashes at 'await secretsManagerClient.send'
const secretsManagerClient = new SecretsManagerClient({
credentials: local ? defaultProvider({ profile: AwsProfile }) : undefined,
region: REGION
});
static #mySecrets = async (secretName) => {
let data;
try {
data = await secretsManagerClient.send(
new GetSecretValueCommand({ SecretId: secretName })
);
return data; // For unit tests.
} catch (err) {
console.log('err', err);
}
};
Noticing this same behavior on v3.67.0
We have the same error we had serious problems with our production environment since few weeks ago. I switched to env variables and I disabled secretsManager client.
Same on our live environment. Initially updated @aws-sdk/client-secrets-manager from version 3.20.0 to 3.52.0. Our lambdas started throwing spikes of the following errors at random intervals throughout the day :
`error.code | EPROTO |
---|---|
error.errno | -71 |
error.errorMessage | write EPROTO |
error.errorType | Error |
error.stack.0 | Error: write EPROTO |
error.stack.1 | at __node_internal_captureLargerStackTrace (internal/errors.js:412:5) |
error.stack.2 | at __node_internal_errnoException (internal/errors.js:542:12) |
error.stack.3 | at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:94:16) |
error.syscall | write |
errorType | AwsError |
stack.0 | AwsError |
stack.1 | at /var/task/packages/aws/dist/secretsManager/secretsManager.js:11:38 |
stack.2 | at processTicksAndRejections (internal/process/task_queues.js:95:5) |
stack.3 | at async Promise.all (index 1) |
`
Upgraded then to 3.89.0 thinking the issue may have been fixed in the meantime but encountering the same behavior.
Update : downgrading back down to version 3.20.0 seems to have resolved it for now.
We are also seeing this issue on v3.130.0 and have opted for the work around to downgrade to 3.20.0. Any updates @RanVaknin?
@AllanZhengYP @RanVaknin is this an issue you've seen. We have been seeing it a quite a few times in recent weeks, with a very recent aws-sdk v3.
ERROR Error: write EPROTO
at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:94:16) {
errno: -71,
code: 'EPROTO',
syscall: 'write',
'$metadata': { attempts: 1, totalRetryDelay: 0 }
}```
@AllanZhengYP @RanVaknin we've investigated this a bit more. It seems that we are seeing the EPROTO error after the lambda times out, and then tries to re-initialise (i.e. we see our cold start code again in the same log group).
We recently began moving a variety of microservices from AWS SDK v2 to v3 and have seen flavors of this error in several repos. Most recently with 3.154.0
Hi All,
Unfortunately Im not able to reproduce this issue. We have multiple issues opened for the same EPROTO error, I tried reproducing with 2 customer examples and never ran into this. I assigned it to the dev team to take a look.
One of my colleagues had done some analysis and suspects the issue is due to the clock being momentarily wrong when the lambda starts up.
On Fri, 2 Sep 2022 at 1:35 am, Ran Vaknin @.***> wrote:
Hi All,
Unfortunately Im not able to reproduce this issue. We have multiple issues opened for the same EPROTO error, I tried reproducing with 2 customer examples and never ran into this. I assigned it to the dev team to take a look.
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-js-v3/issues/3513#issuecomment-1234581982, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQOQYZOFLTOP7PPP5GWKODV4DSOJANCNFSM5SRVDBYA . You are receiving this because you commented.Message ID: @.***>
I too have encountered this over and over and my educated guess is that this happens when some (unrelated) code blocks the event loop a bit too long. AWS services seem to have short timeouts when dealing with connections and the SDK does not retry them, so blocking the JS event loop would delay the connection handling and cause the connection to fail with this error.
Having the same issue.
@RanVaknin What's the status here? This is crashing mission critical processes for us and its been assigned P1 for over a month...
@RanVaknin ?????
On quick revisit during review meeting for issues with p1
labels, we noticed that this issue is likely in Node.js. Search results https://github.com/search?q=repo%3Anodejs%2Fnode+EPROTO&type=issues
We need to find out whether the issue is with the Node.js setup which Lambda follows, or some Node.jsconfiguration which SDK sets, or a bug is Node.js core itself.
The requirement is to provide a minimal repro code which makes multiple secret manager getSecretValue calls. This will help us to log more information, and find out if the issue is specific to Lambda, Node.js or SDK.
For reference, here is a package which attempted to repro npm ping test failure from CodeBuild https://github.com/trivikr/aws-codebuild-npm-ping-test
Has anyone found that using a newer version of Node makes this issue go away? I am planning on upgrading my version of Node, but I was curious if anyone else has already tried this.
Like the OP i am also using 14.X, but I am planning on updating to 18.X
We are also seeing this error regularly now and wondering if a node upgrade would help - also node 14 and on latest sdk packages we are using when trying to assume role with stsClient.send(assumeRoleCommand)
We also see this error fairly frequently with @aws-sdk/client-secrets-manager v3.131.0 and a Node 14.x lambda environment.
It looks like the following issues are closely related which implies it may not exclusively be a secrets manager issue:
Regularly getting this with the SSM client v3.229.0, NodeJS 14. Seems like it's a global issue across many of the clients
Yesterday after posting the above comment, i decided to upgrade my lambdas to Node 16, and so far haven't had this happen. It might be speaking too soon, but @RanVaknin maybe something to pass on to the dev team investigating.
cc @hikarunoryoma (since you asked)
@dgoemans Thanks for the heads up! Looking forward to upgrading my lambdas next month and will follow up if I see success on my end!
We tried Lambda extension for fetching secrets from secrets manager and that has worked quite well https://docs.aws.amazon.com/secretsmanager/latest/userguide/retrieving-secrets_lambda.html
An upgrade to Node18 appears to have resolved this for us
Indeed, 6 weeks after upgrading to Node 16 we haven't seen the issue again. Seems to be Node 14 only.
I updated from Node 14 -> Node 18 and no longer see this issue! Agreed that this is some issue with Node interfacing with the latest AWS sdk
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.
Describe the bug
We're using Secrets Manager to initialize lambda state, and are frequently getting
write EPROTO
failure messages. It started happening recently after we upgraded from v3.41.0 to v3.58.0Your environment
SDK version number
@aws-sdk/client-secrets-manager@3.58.0
Is the issue in the browser/Node.js/ReactNative?
Node.js
Details of the browser/Node.js/ReactNative version
Node.js 14.x
Lambda :)Steps to reproduce
Here's tl;dr of the lambda handler code
Observed behavior
Most of the times, everything works, but then unexpectedly crashes at
await promiseEnv
, andGot environment variables
is never loggedExpected behavior
Secrets Manager would keep working
Screenshots
N/A
Additional context
Here's the raw logs:
```console [TS] [UUID] INFO Requesting environment variables [TS] [UUID] ERROR Invoke Error {"errorType":"Error","errorMessage":"write EPROTO","code":"EPROTO","errno":-71,"syscall":"write","$metadata":{"attempts":1,"totalRetryDelay":0},"stack":["Error: write EPROTO"," at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:94:16)"," at WriteWrap.callbackTrampoline (internal/async_hooks.js:130:17)"]} [TS] [UUID] ERROR (node:9) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 14)\n(Use `node --trace-warnings ...` to show where the warning was created) END RequestId: [UUID] REPORT RequestId: [UUID] Duration: 33.91 ms Billed Duration: 34 ms Memory Size: 1536 MB Max Memory Used: 99 MB Init Duration: 1213.34 ms ```