elastic / require-in-the-middle

Module to hook into the Node.js require function
MIT License
168 stars 26 forks source link

Fixes allow listing mapped modules that resolve with an extension #88

Closed jsumners-nr closed 4 months ago

jsumners-nr commented 7 months ago

PR #82 originally included code to ignore the resolved file extension. This PR restores that functionality.

jsumners-nr commented 6 months ago

@trentm do you have time to take a look at this one?

trentm commented 6 months ago

@jsumners-nr Again sorry for the delay in looking at this.

I've gone back and forth on this a couple of times, and now I think the right answer here is to (a) NOT take this patch and (b) you should use this in your code (i.e. specify the .cjs extension in the hook arg):

const hook = new Hook(['@langchain/core/dist/callbacks/manager.cjs'], function (exports, name) {
    // ...
});

Let me explain.


In an earlier look at this, I thought that this should work just like the existing "sub-module/foo" test: https://github.com/elastic/require-in-the-middle/blob/826dbde2898ef4a3a4ddb0fe6d44287694238b5b/test/sub-module.js#L7-L21

This is what your patch would accomplish. It could also be accomplished with this cleaner patch to the handling that is already normalizing a require.resolved path by removing the .js extension:

diff --git a/index.js b/index.js
index 966b9c9..1b0cf68 100644
--- a/index.js
+++ b/index.js
@@ -47,7 +47,7 @@ if (Module.isBuiltin) { // Added in node v18.6.0, v16.17.0
 }

 // 'foo/bar.js' or 'foo/bar/index.js' => 'foo/bar'
-const normalize = /([/\\]index)?(\.js)?$/
+const normalize = /([/\\]index)?(\.(cjs|js))?$/

 // Cache `onrequire`-patched exports for modules.
 //

This normalization is effectively saying that a resolved load of ".../some-package/some-sub-path/foo.js" should be hookable via a Hook arg of some-package/some-sub-path/foo (i.e. without the extension). Likewise that a resolved load of ".../some-package/some-sub-path/bar/index.js" should be hookable via some-package/some-sub-path/bar.

The only difference with your case (with @langchain/core) is that the file extension is .cjs, right?

However Node's require() doesn't treat .cjs the same way it does .js. https://nodejs.org/api/modules.html#file-modules says:

If the exact filename is not found, then Node.js will attempt to load the required filename with the added extensions: .js, .json, and finally .node. When loading a file that has a different extension (e.g. .cjs), its full name must be passed to require(), including its file extension (e.g. require('./file.cjs')).

I think RITM should follow the same pattern and NOT normalize-away .cjs.

tl;dr

I think the correct RITM usage in this case is to specify the .cjs extension:

const hook = new Hook(['@langchain/core/dist/callbacks/manager.cjs'], function (exports, name) {
    // ...
});

Is there a reason you aren't able to do this in your code?


This should be tested and documented. I'll try to get a PR to test and doc this soon.

trentm commented 6 months ago

@jsumners-nr https://github.com/elastic/require-in-the-middle/pull/89 is my proposed test and doc update to clarify how to hook package-internal .cjs files.

jsumners-nr commented 6 months ago

I don't think that is going to work for us. We don't have separate instrumentations for modules loaded via require vs import. We rely on require-in-the-middle and import-in-the-middle to provide the module export for instrumentation. The issue is that folks like langchain are supplying separate CJS and ESM variants in one package via the exports map. All together, this means we supply a hook for the export name and expect to get back whatever that maps to.

jsumners-nr commented 5 months ago

@trentm have you had a chance to think about this?

trentm commented 5 months ago

@jsumners-nr Can we get more concrete with the hooking code that you are trying to get to work? I think I'm missing something.

The IITM Hook modules arg doesn't support anything but the top-level module name, e.g. @langchain/core, IIUC. So, unless I'm missing something, there is no way to get IitmHook(['@langchain/core/dist/callbacks/manager'], ...) to work.

There is support for an internals: true option as well, but I don't think that's what you are referring to.

Do you have some other in-progress PR to IITM to support something like that?

Also, I wonder if one potential point of confusion between us here is referring to an entry-point to the @langchain/core module. It has this in its package.json#exports:

    "./callbacks/manager": {
      "types": {
        "import": "./callbacks/manager.d.ts",
        "require": "./callbacks/manager.d.cts",
        "default": "./callbacks/manager.d.ts"
      },
      "import": "./callbacks/manager.js",
      "require": "./callbacks/manager.cjs"
    },

That means @langchain/core/callbacks/manager is a hook path that should work with RITM. However, in your example you are using @langchain/core/dist/callbacks/manager, which is NOT an entry-point defined by the module.

jsumners-nr commented 5 months ago

I'm doing some more research based on the information you have provided. At the moment, I have only been able to refactor the test to replicate the issue I am actually trying to solve:

test('handles mapped exports: picks up allow listed resolved module', { skip: nodeSupportsExports }, function (t) {
  t.plan(3)

  let hookHit = false
  const hook = new Hook(['@langchain/core/callbacks/manager'], function (exports) {
    hookHit = true
    return exports
  })

  const { Tool } = require('@langchain/core/tools')
  const MyTool = class MyTool extends Tool {
    _call () {
      t.pass()
    }
  }

  const tool = new MyTool()
  tool.call('foo', [() => { t.pass() }])
  t.equal(hookHit, true)

  hook.unhook()
})

In short:

  1. We try to hook @langchain/core/callbacks/manager (a listed export)
  2. We require @langchain/core/tools (a listed export)
  3. We expect our hook to be hit because the tools import also imports the callback manager script (https://github.com/langchain-ai/langchainjs/blob/48398048de3ff4502b3434f607de338edfef8d10/langchain-core/src/tools.ts#L2-L7)

What we will see is that the tools script is ultimately trying to import @langchain/core/dist/callbacks/manager.cjs. This is shown in the debugger screenshot below.

The "fix" in this PR does not solve this. But before I go deep into solving it, what are your thoughts around this @trentm?

Debugger Screenshot ![image](https://github.com/elastic/require-in-the-middle/assets/150050532/d6405d4c-05dc-40fa-b4b8-88189b7adf9c)
trentm commented 5 months ago

why it currently doesn't work for your case

Tracing the chain of loaded modules:

require('@langchain/core/tools')
// resolves via package.json#exports to node_modules/@langchain/core/tools.cjs, which does:
module.exports = require('./dist/tools.cjs');
// resolves to node_modules/@langchain/core/dist/tools.cjs, which does:
const manager_js_1 = require("./callbacks/manager.cjs");
// resolves to node_modules/@langchain/core/dist/callbacks/manager.cjs

DEBUG=require-in-the-middle debug output for that last "resolves to" (with newlines for some readability):

require-in-the-middle resolved filename "/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/dist/callbacks/manager.cjs" to
   module: @langchain/core (
    id: ./callbacks/manager.cjs,
    resolved: @langchain/core/dist/callbacks/manager.cjs,
    basedir: /Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core) +0ms

So, with the current RITM code, you need one of the following Hook args to match:

  1. absolute path "/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/dist/callbacks/manager.cjs", obviously not relevant for your case,
  2. matching that "resolved" (internal var is fullModuleName): @langchain/core/dist/callbacks/manager.cjs

RITM's matching of an entry point is only looking at the string directly passed to the require(...) function, the id var. So, entry-point matching only currently works for this directly:

require('@langchain/core/callbacks/manager')

and not with whatever internal require() call that happens to resolve to the same file path that some entry point refers to, e.g. require('./callbacks/manager.cjs'); from .../node_modules/@langchain/core/tools.cjs in this example.

implementation thoughts

Thinking about how RITM might implement what you are looking for.

> require.resolve('@langchain/core/callbacks/manager')
'/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/callbacks/manager.cjs'
> require.resolve('@langchain/core/bogus')
Uncaught:
Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './bogus' is not defined by "exports" in /Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/package.json
...

I think this could work.

One subtle change in behaviour, that likely doesn't practically matter, is that this case (where the Hook arg includes both the entry-point and the package sub-module path) will match with with a different name value:

Hook(['@langchain/core/callbacks/manager', '@langchain/core/dist/callbacks/manager.cjs'], (name) => {
    console.log('Hook: name=%s', name)
})

I can't imagine that'll break anyone. If we're very conservative, that would be a major version bump.

jsumners-nr commented 5 months ago

👋 @trentm checking in to see if you've had a chance to consider my finding.

trentm commented 5 months ago

checking in to see if you've had a chance to consider my finding.

@jsumners-nr Did you see my previous comment?

jsumners-nr commented 5 months ago

checking in to see if you've had a chance to consider my finding.

@jsumners-nr Did you see my previous comment?

🤦 no. I failed to refresh the page before commenting and for some reason I never got a GitHub notification of an update.


Your idea sounds like a great solution. Would you like me to work on it?

trentm commented 5 months ago

Would you like me to work on it?

If you have the bandwidth, that would be great. It feels like I won't be able to get to this soon.

jsumners-nr commented 5 months ago

Would you like me to work on it?

If you have the bandwidth, that would be great. It feels like I won't be able to get to this soon.

Thanks. I'll get it into my queue.