getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.84k stars 1.54k forks source link

[v8] Update for ESM monkeypatch + ESM file structure #10046

Closed AbhiPrasad closed 5 months ago

AbhiPrasad commented 7 months ago

We’re going to be adding proper support for ESM in the Sentry repo.

Requirement:

### Tasks
- [ ] https://github.com/getsentry/sentry-javascript/pull/10069
- [x] (pre-v8) Figure out how to use `dynamicRequire` with ESM
- [x] (pre-v8) Remove usage of `require.main.filename` for module name resolution #10061
- [x] (v8) Vendor `https-proxy-agent` which is cjs only #10088
- [x] (v8) Change build to output `mjs`/`cjs` for esm/cjs
- [x] (v8) Remove conditional import of `worker_threads` #10791
- [x] (v8) Update `inspector.d.ts` to work with esm and cjs
- [x] (v8) Remove require call in `require('inspector')` in `LocalVariables` integration
- [x] (v8) Update Rollup
- [ ] https://github.com/getsentry/sentry-javascript/pull/10928
- [ ] https://github.com/getsentry/sentry-javascript/issues/11066
- [ ] https://github.com/getsentry/sentry-javascript/issues/11067

Read below for some justification and information

Module Customization Hooks for Monkey Patching

For ESM bundles on server-side, we’ll require a minimum Node version of 18.6.0 or 20.6.0 or higher. Otherwise we’ll support CJS bundles for Node 14+ generally. This is because we want access to Node customization hooks, which allows us to add monkeypatching for esm modules programatically (doesn't require command-line loader).

We can take some inspiration here from https://github.com/DataDog/import-in-the-middle

import { register } from 'node:module';

register('./sentry-patch.mjs', import.meta.url);

Registering hooks to affect an import still needs to happen before the import is resolved, so there are ordering issues, but I think we can work around this by recommending that Sentry.init is called as soon as possible, and encouraging Sentry.addIntegration patterns for integrations that require objects from different libraries (like express with app).

The modules we need register via register API:

~We need to re-evaluate if we need monkeypatching for the database integrations now that we have OpenTelemetry.~

Change File Structure

We’ll need to move our file structure to the following:

{
  // at build time we strip `build/X`
  "main": "build/cjs/index.cjs",
  "module": "build/esm/index.mjs",
  "types": "build/types/index.d.cts",
  // note: this is an opportunity for us to explore multiple subpaths, like
  // exporting from `@sentry/browser/utils`
  "exports": {
    "./package.json": "./package.json",
    ".": {
      // esm
      "import": {
        // path must be relative
        "types": "./build/types/index.d.mts",
        "default": "./build/esm/index.mjs"
      },
      // cjs
      "require": {
        // path must be relative
        "types": "./build/types/index.d.cts",
        "default": "./build/esm/index.mjs"
      }
      // we might need to expose an ESM only export for customization hooks
      // to use via node --loader=... API
    }
  },
  "typesVersions": {
    "<4.9": {
      "build/types/index.d.cts": [
        "build/types-ts3.8/index.d.cts"
      ],
      "build/types/index.d.mts": [
        "build/types-ts3.8/index.d.mts"
      ]
    }
  },
  "files": [
    "cjs",
    "esm",
    "types",
    "types-ts3.8"
  ]
}

Shameless promo: Watch my talk if you want more details about this.

We also should change all node standard library imports to import from node:X.

Make dynamicRequire/loadModule work with ESM import

This exists to trick webpack, but we use it all over the codebase. We need to add an esm compatible way.

One thing we can do is introduce an async dynamicRequire that simulates await import, and then build time replace the functionality of dynamicRequire to use import or require under the hood?

AbhiPrasad commented 7 months ago

To validate our package.json setup, we'll need to add https://www.npmjs.com/package/@arethetypeswrong/cli as a validation check in CI.

mydea commented 7 months ago

I don't think we'll need to worry about node:http and node:https, as I guess for node this will be taken care of by the otel instrumentation?

timfish commented 7 months ago

There is also usage of require.main.filename for module name resolution

AbhiPrasad commented 7 months ago

I don't think we'll need to worry about node:http and node:https, as I guess for node this will be taken care of by the otel instrumentation?

We'll still need something to add breadcrumbs right? Or does that work?

AbhiPrasad commented 7 months ago

There is also usage of require.main.filename for module name resolution

Related GH issue: https://github.com/nodejs/node/issues/49440

mydea commented 7 months ago

We'll still need something to add breadcrumbs right? Or does that work?

For this we hook into the otel instrumentation and add breadcrumbs from there - see the http integration in node-experimental!

timfish commented 7 months ago

There is also usage of require.main.filename for module name resolution

Turns out process.argv[1] will be and has always been the entry point in Node so we can use that over require.main.filename. The only place this doesn't hold true is Electron where this is automatically fetched from package.json > main and doesn't end up in argv

timfish commented 7 months ago

// note: this is an opportunity for us to explore multiple subpaths, like // exporting from @sentry/browser/utils

package.json exports support was added in Node v12.7.0. Before that, the only way to support subpaths is to output the cjs files into a matching directory in the package root.

https://nodejs.org/api/packages.html#exports

The history suggests conditional exports was added even later but I guess it will still fallback to main.

Lms24 commented 7 months ago

One thing we might need to consider is our usage of http-proxy-agent in the Node transport. Whenever I used ESM versions of our SDKs in Vite-based frameworks (Sveltekit, Astro) it seemed like Vite wasn't able to correctly work with this dependency because it's CJS only (so even after experimentally removing stuff like the http integrations to avoid dynamic requires, this seemed to be the problem I didn't find a workaround for).

Can we somehow get rid of this dependency?

Chances are that this problem is limited to Vite as https://github.com/getsentry/sentry-javascript/pull/10042 shows. However, while we can change the respective vite config setting in Astro for our users, it's not as easy in other frameworks.

I'd vote we look at this once we made the other changes to check if it's actually still a problem by then. Maybe it was just a by-product of us using older package.json properties or the generic file extensions.

timfish commented 7 months ago

One thing we might need to consider is our usage of http-proxy-agent in the Node transport Can we somehow get rid of this dependency?

It's on my todo list. The latest version is fully typescript so I think I started a PR to vendor it.

EDIT

Ah I remembered what the issue was, they appear to have dropped support for older node versions: https://github.com/getsentry/sentry-javascript/issues/9199#issuecomment-1860670538

timfish commented 7 months ago
import { register } from 'node:module';

register('./sentry-patch.mjs', import.meta.url);

Will this work with bundling?

Since register accepts a URL, it also supports data URLs so we can pass the script in the same way we do for the Anr worker.

So the following code:

import { register } from "node:module";

const hookScript = Buffer.from(`
export async function initialize() {
  console.log("initialize");
}

export async function resolve(specifier, context, nextResolve) {
  console.log("resolve", specifier, context);
  return nextResolve(specifier);
}

export async function load(url, context, nextLoad) {
  console.log("load", url, context);
  return nextLoad(url);
}
`);

register(
  new URL(`data:application/javascript;base64,${hookScript.toString("base64")}`),
  import.meta.url
);

const inspect = await import("node:inspector/promises");

Outputs:

initialize
resolve node:inspector/promises {
  conditions: [ 'node', 'import', 'node-addons' ],
  importAttributes: {},
  parentURL: 'file:///Users/tim/Documents/Repositories/node-test/index.mjs'
}
load node:inspector/promises { format: undefined, importAttributes: {} }
AbhiPrasad commented 7 months ago

Will this work with bundling?

I have to test it out, but luckily the opt-out is just asking folks to use node --loader=...

GauBen commented 7 months ago

To validate our package.json setup, we'll need to add npmjs.com/package/@arethetypeswrong/cli as a validation check in CI.

I'd recommend publint too as it currently finds error that arethetypeswrong does not find: https://publint.dev/@sentry/node

timfish commented 6 months ago

We don't need to concern ourselves with bundling issues for now. If users bundle their code, libraries will not be loaded via the import hook anyway!

I was initially left confused over how import-in-the-middle actually works. The docs say that module customisation hooks run in their own thread so how does it call the hooks in "userland" from the hook thread? It turns out they inject code into the "userland" side to hook this up and then they can use shared state in global variables: https://github.com/DataDog/import-in-the-middle/blob/c3c2c52c1915b47994af59d507c59029c1f1fae9/hook.js#L302-L324

After experimenting and consulting the docs I can conclude that there is no way to have import hooks work fully without passing a command line argument to node. As per the ESM spec, imports are resolved and loaded before any code is executed.

The node.js docs say the import hook registration should be via the --import arg:

Using --import ensures that the hooks are registered before any application files are imported, including the entry point of the application. Alternatively, register can be called from the entry point, but dynamic import() must be used for any code that should be run after the hooks are registered.

So calling register in the regular entry code can only affect dynamic imports!

If we add a package export for /register which registers our hook, the cli args will be reduced to:

node --import @sentry/node/register app.js
AbhiPrasad commented 5 months ago

I'm going to close this issue. We now have 3 follow up areas:

  1. Remix SDK monkeypatches @remix-run/server-runtime via a dynamic require. With vite powered remix, we may get better APIs to do monkeypatching here. For now we should debate if we temporarily make the Remix SDK

Opened https://github.com/getsentry/sentry-javascript/issues/11067 to track ESM support for Remix SDK.

  1. The serverless SDKs use require to monkeypatch GCP and AWS related libraries. This does not work with ESM (and it gets even more complicated with bundling). For now we made those SDKs CJS only.

Opened https://github.com/getsentry/sentry-javascript/issues/11066 to track ESM support for serverless SDKs.

  1. OpenTelemetry ESM/Bundling/Docs gaps and problems

Overall we've identified some issues with OpenTelemetry instrumentation with ESM (even more problematic when bundling). We created https://github.com/getsentry/sentry-javascript/issues/11070 to track these issues to follow up.