Closed astorm closed 3 years ago
the below badges are clickable and redirect to their specific view in the CI or DOCS
#### Build stats * Start Time: 2021-10-01T06:10:08.173+0000 * Duration: 3 min 20 sec * Commit: 947bfcd6260700d62e90a5323ebd373156be1444 #### Test stats :test_tube: | Test | Results | | ------------ | :-----------------------------: | | Failed | 0 | | Passed | 16989 | | Skipped | 0 | | Total | 16989 |
Is adding a test for this behaviour reasonable? Or will that come from tests in the agent using this?
@trentm
Is adding a test for this behaviour reasonable?
Open to it, but when I tried to reason though a test case that would exercise this code in a meaningful way I came up short. Did you have something specific in mind?
Regarding a test case for this: I think it would be possible to reproduce what the Node.js Lambda Runtime is doing here, but that efforts are, at least initially, best spent on getting test cases in apm-agent-nodejs.git using this and testing with each type of Node.js lambda function handler: async, sync (i.e. using a callback), sync (with context.callbackWaitsForEmptyEventLoop = false
); and playing with having a short setTimeout
to artificially have an active handle.
My understanding is we should be able to setup tests for these running https://hub.docker.com/r/amazon/aws-lambda-nodejs in Docker.
Also, I think we should get those tests going in a PR on the agent (using these changes to the http-client) before this PR is merged, because I'm not sure the solution here is complete.
Looking at the AWS Lambda Node.js Runtime shows how the runtime handles the "freeze when there are no more active handles" for the callback-style sync handler:
It is using a "beforeExit" event handler (registered here). That's fine. However, the http-client is also using a "beforeExit" handler:
which initiates (a) this._intakeRequestGracefulExitFn()
which is good and (b) this.end()
which I think is bad. I would think this means that either right at the end of the first invocation of the function in this VM or right at the start of the second invocation the http Client
will destroy itself.
It isn't clear from your comment on #165:
We've observed when using the lambda callback based API (vs. using an async function or a returned promise) that the client's stream can be rendered un-writable.
if this might be what you were seeing. Are you able to provide more details and a repro of that observation?
I think the possible "beforeExit" interaction should be looked into before merging this PR.
agent.flush()
in our lambda wrapper here:Currently agent.flush()
does not have a way to wait for and include inflight spans. This is https://github.com/elastic/apm-agent-nodejs/issues/2294
This is a separate issue, however.
I'm happy to help look into writing tests on the agent for this, if you like.
Are you able to provide more details and a repro of that observation?
Here's two error cases we're seeing when we do not explicitly reference that socket (i.e. with the master branch client)
Error logs are here: lambda-error-logs.zip
With a simple lambda function handler that looks like this
const apm = require('elastic-apm-node').start({
})
exports.handler = apm.lambda(function handler (event, context, callback) {
const response = {
statusCode: 200,
body: "hello function."
};
callback(null, response)
})
API gateway requests to the lambda produce errors that look like this
{
"log.level": "error",
"@timestamp": "2021-10-02T18:42:52.091Z",
"log": {
"logger": "elastic-apm-node"
},
"ecs": {
"version": "1.6.0"
},
"message": "APM Server transport error: Error: write after end\n at writeAfterEnd (/var/task/node_modules/readable-stream/lib/_stream_writable.js:261:12)\n at Client.Writable.write (/var/task/node_modules/readable-stream/lib/_stream_writable.js:305:21)\n at Client.sendTransaction (/var/task/node_modules/elastic-apm-http-client/index.js:569:15)\n at Instrumentation.addEndedTransaction (/var/task/node_modules/elastic-apm-node/lib/instrumentation/index.js:241:22)\n at Instrumentation.wrappedAddEndedTransaction [as addEndedTransaction] (/var/task/node_modules/elastic-apm-node/lib/instrumentation/async-hooks.js:53:34)\n at Transaction.end (/var/task/node_modules/elastic-apm-node/lib/instrumentation/transaction.js:250:32)\n at captureAndMakeCompleter (/var/task/node_modules/elastic-apm-node/lib/lambda.js:48:11)\n at wrappedLambdaCallback (/var/task/node_modules/elastic-apm-node/lib/lambda.js:61:20)\n at Runtime.handler (/var/task/index.js:10:11)\n at Runtime.wrappedLambda [as handler] (/var/task/node_modules/elastic-apm-node/lib/lambda.js:101:17)\n at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
}
{
"log.level": "error",
"@timestamp": "2021-10-02T18:42:52.111Z",
"log": {
"logger": "elastic-apm-node"
},
"ecs": {
"version": "1.6.0"
},
"message": "APM Server transport error: Error: write after end\n at writeAfterEnd (/var/task/node_modules/readable-stream/lib/_stream_writable.js:261:12)\n at Client.Writable.write (/var/task/node_modules/readable-stream/lib/_stream_writable.js:305:21)\n at Client.flush (/var/task/node_modules/elastic-apm-http-client/index.js:595:15)\n at Agent.flush (/var/task/node_modules/elastic-apm-node/lib/agent.js:553:21)\n at /var/task/node_modules/elastic-apm-node/lib/lambda.js:50:13\n at wrappedLambdaCallback (/var/task/node_modules/elastic-apm-node/lib/lambda.js:65:9)\n at Runtime.handler (/var/task/index.js:10:11)\n at Runtime.wrappedLambda [as handler] (/var/task/node_modules/elastic-apm-node/lib/lambda.js:101:17)\n at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
}
{
"log.level": "error",
"@timestamp": "2021-10-02T18:42:52.111Z",
"log": {
"logger": "elastic-apm-node"
},
"ecs": {
"version": "1.6.0"
},
"message": "Flush error: write after end"
}
A single transaction makes it to to APM Server, but no further requests make it through until the next cold start.
Logs in the attached zip file are "case-callback.log"
+--------------------------------------------------+
With a simple lambda function that returns a promise
const apm = require('elastic-apm-node').start({})
exports.handler = apm.lambda(
function handler (event, context) {
return new Promise(function(resolve, reject){
const response = {
statusCode: 200,
body: "hello function."
};
resolve(response)
})
}
)
API gateway requests to the lambda produce error responses that look like this
% curl -i 'https://xxx.execute-api.us-west-2.amazonaws.com/default/function-name'
HTTP/2 500
date: Sat, 02 Oct 2021 18:59:47 GMT
content-type: application/json
content-length: 35
apigw-requestid: Gl_VkhowPHcEP-w=
{"message":"Internal Server Error"}
Logs in the attached zip are are "case-promise.log"
Addresses https://github.com/elastic/apm-nodejs-http-client/issues/165
This PR addresses the following
by ensuring the socket is
ref
ed, which will prevent the Lambda function from exiting early.Additionally, with this fix in place and with the agent communicating with the lambda extension prototype (vs. communicating directly with a remote APM server), the following problem
was not present.