Open SimonSimCity opened 4 years ago
Just FYI, I think it's extremely important for this issue to have a deep understanding of the event loop:
The callbacks, set by async-hooks
are called as process.nextTick()
- so they are executed immediately after the callback requesting it finished execution - according to the last slide on the youtube video and the description about async-hooks (https://nodejs.org/api/async_hooks.html#async_hooks_asynchronous_context_example).
Can confirm this cause problems on our production too, about 30% of process CPU time
I suggest this code
if (asyncIds) {
const index = asyncIds.indexOf(asyncId)
asyncIds.splice(index, 1)
}
has O(N) complexity and slows things down
I realize this is a year later, apologies for that.
Here is a first (possibly) naive patch to the agent that changes to 0(N)
complexity to O(1)
:
diff --git a/lib/instrumentation/async-hooks.js b/lib/instrumentation/async-hooks.js
index 1dd168ff..df2a1b0a 100644
--- a/lib/instrumentation/async-hooks.js
+++ b/lib/instrumentation/async-hooks.js
@@ -77,12 +77,12 @@ module.exports = function (ins) {
activeTransactions.set(asyncId, transaction)
// Track the context by the transaction
- let asyncIds = contexts.get(transaction)
+ const asyncIds = contexts.get(transaction)
if (!asyncIds) {
- asyncIds = []
- contexts.set(transaction, asyncIds)
+ contexts.set(transaction, new Set([asyncId]))
+ } else {
+ asyncIds.add(asyncId)
}
- asyncIds.push(asyncId)
const span = ins.bindingSpan || ins.activeSpan
if (span) activeSpans.set(asyncId, span)
@@ -103,12 +103,10 @@ module.exports = function (ins) {
function destroy (asyncId) {
const span = activeSpans.get(asyncId)
const transaction = span ? span.transaction : activeTransactions.get(asyncId)
-
if (transaction) {
const asyncIds = contexts.get(transaction)
if (asyncIds) {
- const index = asyncIds.indexOf(asyncId)
- asyncIds.splice(index, 1)
+ asyncIds.delete(asyncId)
}
}
However, in my testing (load testing a demo app and looking at a flamegraph), my changes don't make an effective change in performance on what I'd imagine is a typical application.
To try to create an extreme case, I have a demo express app with an endpoint handler like this:
function deep_callstack(req, res, next) { // my handler
asyncRecurseN(1000, function () {
setTimeout(function () {
res.send('deep callstack done')
}, 1)
})
}
function asyncRecurseN(n, cb) {
if (n <= 0) {
process.nextTick(cb)
return
} else {
process.nextTick(asyncRecurseN, n-1, cb)
}
}
Putting that under load gen and getting a flamegraph shows that I can reduce the amount of time that "destroy" async_hook method is top-of-stack. However, even in this extreme case (over 1000 async ops in the context of this one transaction with no spans) the overall app improvement was only on the order of a few percentage points.
If anyone still has this issue, or data on what condition triggered this issue (millions of async ops for a single APM transaction or span??), I'd be very interested to hear.
In a script of mine, where I heavily use use callbacks, seems to get an extremely high spike of calls of the function
destroy
in theasync-hooks
file. https://github.com/elastic/apm-agent-nodejs/blob/master/lib/instrumentation/async-hooks.js#L103All the green spans are for
internal/async_hooks.js
, and the blue below is the mentioneddestroy
function.For me it's now a question if we should not remove this async monitoring all together. It uses a lot of computation power which could be used for the actual task.