Open Max101 opened 12 months ago
We're seeing this too, we're just executing in a normal NodeJS context outside of lambda, so perhaps it's more widespread. However, it is throwing an exception to the caller as well so it crashes the executing context for us.
using NodeJS nodejs20.9.0 Installed library version is dd-trace@4.19.0
Error: socket hang up at connResetException (node:internal/errors:721:14)
at TLSSocket.socketOnEnd (node:_http_client:519:23)
at TLSSocket.emit (node:events:526:35)
at TLSSocket.emit (node:domain:488:12)
at TLSSocket.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25)
at endReadableNT (node:internal/streams/readable:1408:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
@Harmonickey do you know what lead to exception? For example, an outgoing request?
We are also suffering from this issue, although we are not hitting it only on lambdas. Most of the time it is related to dynamodb calls.
We are using ddtrace-js v5.1.0
Error: socket hang up
at connResetException (node:internal/errors:720:14)
at TLSSocket.socketOnEnd (node:_http_client:525:23)
at TLSSocket.emit (node:events:529:35)
at TLSSocket.emit (/usr/src/api/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
at endReadableNT (node:internal/streams/readable:1400:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
We are also seeing this issue, we see it from lambda doing POST calls to DynamoDB.
Error: socket hang up
at connResetException (node:internal/errors:720:14)
at TLSSocket.socketCloseListener (node:_http_client:474:25)
at TLSSocket.emit (node:events:529:35)
at TLSSocket.emit (/opt/nodejs/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25)
at node:net:350:12
at TCP.done (node:_tls_wrap:614:7)
at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
CDK Construct: v1.8.0 Extension version: 48
Same issue here. Not from a lambda, just when doing dynamodb calls.
Using 5.2.0
Error: socket hang up
at connResetException (node:internal/errors:720:14)
at TLSSocket.socketOnEnd (node:_http_client:525:23)
at TLSSocket.emit (node:events:529:35)
at TLSSocket.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25)
at endReadableNT (node:internal/streams/readable:1400:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
Hi we are suffering it too. using 3.33.0
Error: socket hang up at connResetException (node:internal/errors:705:14) at TLSSocket.socketCloseListener (node:_http_client:467:25) at TLSSocket.emit (node:events:525:35) at TLSSocket.emit (/var/task/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25) at node:net:301:12 at TCP.done (node:_tls_wrap:588:7) at TCP.callbackTrampoline (node:internal/async_hooks:130:17)
All from the AWS-SDK and all doing dynamo calls? Which version of the aws-sdk is everyone using?
@astuyve here we have 2 different versions:
"@aws-sdk/client-dynamodb": "^3.387.0",
"@aws-sdk/lib-dynamodb": "^3.387.0",
"@aws-sdk/smithy-client": "^3.374.0",
and
"@aws-sdk/client-dynamodb": "=3.40.0",
"@aws-sdk/lib-dynamodb": "=3.40.0",
"@aws-sdk/smithy-client": "=3.40.0",
@astuyve We use version 3.362.0, that is provided by the lambda nodejs runtime.
@astuyve Here you have:
"@aws-sdk/client-dynamodb": "3.474.0",
"@aws-sdk/util-dynamodb": "3.474.0",
So far everyone is using the v3 sdk, has anyone reproduced this with v2?
@astuyve can we do something for v3 meanwhile no one with v2 answers here? 🙏🏻
Hi @viict - I'm not sure there's something specific we can do right now. I was hoping someone could replicate with AWS SDK v2 or demonstrate definitively that ddtrace is causing this issue.
Instead, it seems that ddtrace is recording that the tcp connection was closed by the server without a response. I noticed other users reporting the same issue. The aws-sdk author also closed this issue as something that can happen
.
I could certainly be wrong here, but I'm still not sure what exactly we'd change in this project at this time.
Does anyone have a minimally reproducible example? Does removing dd-trace solve this definitively? Does this impact application code, or is it successful on retries?
Thanks!
@astuyve oh I understand that of course. I'll see what I can do to improve and share here as well if I'm able to answer any of these questions.
@Harmonickey do you know what lead to exception? For example, an outgoing request?
It was an outgoing request from the dd-trace library to DataDog sending an 'info' message.
Here is my initial configuration in case that helps.
const httpTransportOptions = {
host: 'http-intake.logs.datadoghq.com',
path: `/v1/input/${environment.datadog.apiKey}?ddsource=nodejs&service=${service}`
+ `&env=${environment.name}&envType=${isWorkerEnv ? 'work' : 'web'}`,
ssl: true,
};
const logger = createLogger({
level: 'info',
exitOnError: false,
format: format.json(),
transports: [
new transports.Http(httpTransportOptions),
],
});
Then during runtime calling logger.info('some string message') is when it threw the exception. the message is a static string and it does not always throw.
Because I haven't seen this error in a while, I suspect it was due to DataDog intake servers just being overloaded? So the connection wasn't responded to quickly enough and threw the socket hang up error. Perhaps DataDog has fixed it since then and improved their response times.
@tlhunter any updates here?
We are getting a lot socket hang up
recently, we are using 4.34.0
version of dd-trace.
[HPM] ECONNRESET: Error: socket hang up
at connResetException (node:internal/errors:720:14)
at Socket.socketCloseListener (node:_http_client:474:25)
at Socket.emit (node:events:529:35)
at Socket.emit (node:domain:552:15)
at Socket.emit (/usr/src/app/node_modules/@letsdeel/init/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
at TCP.<anonymous> (node:net:350:12)
at TCP.callbackTrampoline (node:internal/async_hooks:128:17) {
code: 'ECONNRESET'
}
We are also experiencing this issue using:
"dd-trace": "^5.6.0"
Error: socket hang up
at Socket.socketOnEnd (node:_http_client:524:23)
at Socket.emit (node:events:530:35)
at Socket.emit (/opt/nodejs/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
at endReadableNT (node:internal/streams/readable:1696:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
we are having the same issue with latest version of dd-trace v4.36.0
Did you switch from node18 to node20? in Node 19 they changed the keep alive default - https://nodejs.org/en/blog/announcements/v19-release-announce#https11-keepalive-by-default Leading to a number of issues: Some outlined here: https://github.com/nodejs/node/issues/47130
we see this around calls to AWS services, sns, sqs, etc. (all self heal with the SDK retry logic). What it unclear to me is if this is an from dd-trace error or is dd-trace just logging the issue from the aws call?
Error: socket hang up
at TLSSocket.socketOnEnd (node:_http_client:524:23)
at TLSSocket.emit (node:events:530:35)
at TLSSocket.emit (/opt/nodejs/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
at endReadableNT (node:internal/streams/readable:1696:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
Here is the info tab from this same raw error:
@astuyve we are experiencing the same problem but not related to an AWS SDK issue, and I've been able to track it down to a timeout on an API call.
We are using axios
for requests, so the package.json
file has:
{
"dependencies": {
// ...
"axios": "^1.6.7",
// ...
"datadog-lambda-js": "^7.96.0",
"dd-trace": "^4.26.0",
// ...
"serverless": "^3.38.0",
// ...
},
"devDependencies": {
// ...
"serverless-plugin-datadog": "^5.56.0",
// ...
},
// ...
}
We deploy with serverless
which has:
# ...
frameworkVersion: '3'
plugins:
- serverless-plugin-datadog
provider:
name: aws
architecture: arm64
runtime: nodejs16.x
custom:
version: '1'
datadog:
addExtension: true
apiKey: ${env:DD_API_KEY, ''}
service: public-charging-api
env: ${opt:stage}
version: ${env:DD_VERSION, ''}
enableDDTracing: true
# ...
We have some API call that uses axios
in a pretty normal way, like this:
const response: AxiosResponse = await axios.request({
method: 'GET',
url,
headers: { authorization },
timeout: 20000,
});
(That's wrapped in a try/catch, so we know exactly what we are logging in any case.)
Functionally: we have a Lambda that makes ~50 HTTP requests in a very amount of time, and sometimes a dozen of them will take too long to resolve, so in that Lambda execution we are timing out those requests.
For every request that is aborted by axios due to timeout, we are getting this "Error: socket hang up" log.
The "third party frames" makes me suspect that it's the DataDog layer adding these.
Thanks Tobias!! that's a great clue, @tlhunter any thoughts here?
I can confirm @saibotsivad's observations as well.
We are getting this same issue with EventBridge calls on Node18 lambdas. Lambdas execute with no issues, but dd-trace throws up the same 'socket hang up' error in our Traces
We're getting this error running in EKS with DD JS version v5.12.0
which is causing our health checks to fail because it's taking > 3 seconds to finish a request. The root cause is a delay before this socket hang up.
Hi I can see this issue has popped up a few times in the past but it seems like its been resolved so I am opening a new issue.
We are experiencing multiple
Error: socket hang up
in traces BUT not in logs. Our lambda finishes successfully, and there are no errors in the logs. However, where the issue is quite visible, is in APM. We have thousands of similar logs across most of our services.We went to analyze our code and really cannot seem to find an issue. Additionally if this were an issue in our code, it would break, no?
We are on Lambda using NodeJS
nodejs16.x
Installed library version isdd-trace@4.4.0
Installed DD constructs"datadog-cdk-constructs-v2": "1.7.4",
We are using SST v2 (Serverless Stack) to deploy our lambda code
Our DD config looks like this