aws / aws-xray-sdk-node

The official AWS X-Ray SDK for Node.js.
Apache License 2.0
267 stars 154 forks source link

Replace cls-hooked with native async_hooks usage #338

Open simonbuchan opened 3 years ago

simonbuchan commented 3 years ago

This is an odd one, I'm raising it here mostly as a signpost to what's going on for other people hitting it, since it's not really an AWS issue (really, it seems like a rollup or node issue), but possibly there's something this package can do here.

Basically, I use rollup to create my lambda deployment packages, and when I tried adding aws-xray-sdk-core@3.2.0 it caused the lambda (runtime nodejs12.x) to fail with a core dump(!) with the message:

/var/lang/bin/node[7]: ../src/async_wrap.cc:308:void node::SetupHooks(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `env->async_hooks_init_function().IsEmpty()' failed.

If I run the bundled .js file locally with node v14.10.0, I get pretty much the same error:

[19312]: c:\ws\src\async_wrap.cc:455: Assertion `env->async_hooks_init_function().IsEmpty()' failed.

With some experimenting, I've narrowed it down to simply importing async-hook-jl, which is imported by cls-hooked when it detects node is less than v8. Due to the way rollup works, this is actually always included and run. I'm not sure if there's a better option, but I got this to work under rollup with the following mini-plugin, which also disables the pkginfo module that's throwing under rollup too:

{
  name: "exclude-modules",
  files: [
    "cls-hooked/context-legacy",
    "pkginfo",
  ].map((id) => require.resolve(id)),
  load(file) {
    if (this.files.includes(file)) {
      return { code: "" };
    }
  },
},

Ideally, since node <10 is EOL, it would be nice if the cls-hooked dependency could be dropped and the native async_hooks module was used directly, or a more modern replacement used, which would avoid this particular headache, but since you've just updated to it to drop node <8 support, I'm guessing that's not likely.

willarmiros commented 3 years ago

Hi @simonbuchan, Thanks for reporting this. I'm not too familiar with rollup but this:

which is imported by cls-hooked when it detects node is less than v8. Due to the way rollup works, this is actually always included and run

does not sound like desirable behavior, is that a bug that you could open with Rollup?

In terms of removing cls-hooked and using async_hooks directly, we'd rather avoid reinventing the wheel and having to implement a CLS on top of the async_hooks lib ourselves. Node 14 introduced a native AsyncLocalStorage that we could probably replace cls-hooked with, but we'd still need the cls-hooked dependency until Node 10 and 12 were EOL.

simonbuchan commented 3 years ago

It's not really a Rollup bug, it's a design choice AFAIK. Rollup, unlike Webpack, exports a flat ES module and expects ES module inputs if you don't use plugins (you pretty much always do), so it can't realistically conditionally run module-level code - though it's probably possible.

If there's an action to take here for this package, the nicest option would be replacing cls-hooked: the problematic legacy code is for EOL node versions, and the library hasn't been updated in 3 years. There are quite a few forks, but I haven't gone through them to see if there's a plausible path forwards. Perhaps instead importing from cls-hooked/context would be enough, though now you're depending on internals to a small extent.

But really, there's not that much code to track async context, you could quite easily associate segments automatically as simply as this:

import { executionAsyncId, createHook } from "async_hooks";

// This doesn't update a global (`process.namespaces[name].active`) on `before()` and `after()` hooks,
// as `cls-hooked` does, it simply depends on `executionAsyncId` being up to date, so it may
// be a bit slower, though honestly I doubt it.

const segments = new Map();

export function setSegment(segment) {
  segments.set(executionAsyncId(), segment);
}

export function getSegment() {
  return segments.get(executionAsyncId());
}

const hook = createHook({
  init(id, type, triggerId) {
    segments.set(id, segments.get(triggerId));
  },
  destroy(id) {
    segments.delete(id);
  }
});

hook.enable();

I've only tested this code in v14.10, but this API surface hasn't changed much since v8.6.0, so barring outright bugs (possible with an API still marked as experimental!) I doubt anything different would be seen.

Obviously you would need to do testing across your supported versions, and avoiding regressions, such as with switching between automatic and manual etc, but you would have to do that with using AsyncLocalStorage too, and this is here now, and you're transitively already using it.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in next 7 days. Thank you for your contributions.

willarmiros commented 3 years ago

@simonbuchan thanks for the write up and POC, I think this is a fair feature request given cls-hooked's rustiness. I'm going to change the name of this issue and consider it a feature request for the same if you don't mind. This would probably be considered a breaking change since we have APIs that directly expose cls-hooked Namespaces, but it's still worth pursuing especially for a major version.

simonbuchan commented 3 years ago

Sure sounds good!

Didn't see that you expose namespace instances: you might be able to reimplement them, but I agree it's not worth the risk.

chris-pardy commented 3 years ago

@simonbuchan @willarmiros Given that this package is already limited to Node versions > 8 it seems reasonable to do a very small patch to simply import cls-hooked/context rather than cls-hooked that would skip the version detection but not require reimplementation of the cls-hooked feature set and types.

chris-pardy commented 3 years ago

@simonbuchan @willarmiros Since 12.7 there's been native support for Continuation Local Storage (https://nodejs.org/docs/latest-v12.x/api/async_hooks.html#async_hooks_class_asynclocalstorage) 10.x leaves LTS support in 3 months (April 2021). at which point I would assume that xray-sdk could bring up the minimum engine version and cut a major version release that just used the native CLS features?

cheruvian commented 2 years ago

Any update on this?

With the 25-30% performance improvements using AsyncLocalStorage vs cls-hooked this is something that would be a big win.

Not to mention being a part of the standard library means that it should work with a wider range of libraries.

chris-pardy commented 2 years ago

Any update on this?

@cheruvian x-ray will no longer core dump when using rollup. As I recall there are 2 paths forward. The first is to release a major version update since the API for x-ray has some cls-hooked classes in it. The second option is to reimplement the cls-hooked bits that are part of the x-ray API using AsyncLocalStorage. That could avoid a major version update for the most part although it does introduce some risk.

willarmiros commented 2 years ago

@chris-pardy is correct on this. Also, the AWS Distro for OpenTelemtry JavaScript uses AsyncLocalStorage for context management by default, and has first-class support with AWS X-Ray. I would recommend checking it out for such performance benefits.