Closed jsumners-nr closed 4 months ago
@trentm do you have time to take a look at this one?
@jsumners-nr Again sorry for the delay in looking at this.
I've gone back and forth on this a couple of times, and now I think the right answer here is to (a) NOT take this patch and (b) you should use this in your code (i.e. specify the .cjs
extension in the hook arg):
const hook = new Hook(['@langchain/core/dist/callbacks/manager.cjs'], function (exports, name) {
// ...
});
Let me explain.
In an earlier look at this, I thought that this should work just like the existing "sub-module/foo" test: https://github.com/elastic/require-in-the-middle/blob/826dbde2898ef4a3a4ddb0fe6d44287694238b5b/test/sub-module.js#L7-L21
This is what your patch would accomplish. It could also be accomplished with this cleaner patch to the handling that is already normalizing a require.resolve
d path by removing the .js
extension:
diff --git a/index.js b/index.js
index 966b9c9..1b0cf68 100644
--- a/index.js
+++ b/index.js
@@ -47,7 +47,7 @@ if (Module.isBuiltin) { // Added in node v18.6.0, v16.17.0
}
// 'foo/bar.js' or 'foo/bar/index.js' => 'foo/bar'
-const normalize = /([/\\]index)?(\.js)?$/
+const normalize = /([/\\]index)?(\.(cjs|js))?$/
// Cache `onrequire`-patched exports for modules.
//
This normalization is effectively saying that a resolved load of ".../some-package/some-sub-path/foo.js" should be hookable via a Hook
arg of some-package/some-sub-path/foo
(i.e. without the extension). Likewise that a resolved load of ".../some-package/some-sub-path/bar/index.js" should be hookable via some-package/some-sub-path/bar
.
The only difference with your case (with @langchain/core
) is that the file extension is .cjs
, right?
However Node's require()
doesn't treat .cjs
the same way it does .js
. https://nodejs.org/api/modules.html#file-modules says:
If the exact filename is not found, then Node.js will attempt to load the required filename with the added extensions:
.js
,.json
, and finally.node
. When loading a file that has a different extension (e.g..cjs
), its full name must be passed torequire()
, including its file extension (e.g.require('./file.cjs')
).
I think RITM should follow the same pattern and NOT normalize-away .cjs
.
I think the correct RITM usage in this case is to specify the .cjs
extension:
const hook = new Hook(['@langchain/core/dist/callbacks/manager.cjs'], function (exports, name) {
// ...
});
Is there a reason you aren't able to do this in your code?
This should be tested and documented. I'll try to get a PR to test and doc this soon.
@jsumners-nr https://github.com/elastic/require-in-the-middle/pull/89 is my proposed test and doc update to clarify how to hook package-internal .cjs files.
I don't think that is going to work for us. We don't have separate instrumentations for modules loaded via require
vs import
. We rely on require-in-the-middle
and import-in-the-middle
to provide the module export for instrumentation. The issue is that folks like langchain
are supplying separate CJS and ESM variants in one package via the exports map. All together, this means we supply a hook for the export name and expect to get back whatever that maps to.
@trentm have you had a chance to think about this?
@jsumners-nr Can we get more concrete with the hooking code that you are trying to get to work? I think I'm missing something.
The IITM Hook modules
arg doesn't support anything but the top-level module name, e.g. @langchain/core
, IIUC. So, unless I'm missing something, there is no way to get IitmHook(['@langchain/core/dist/callbacks/manager'], ...)
to work.
There is support for an internals: true
option as well, but I don't think that's what you are referring to.
Do you have some other in-progress PR to IITM to support something like that?
Also, I wonder if one potential point of confusion between us here is referring to an entry-point to the @langchain/core
module. It has this in its package.json#exports:
"./callbacks/manager": {
"types": {
"import": "./callbacks/manager.d.ts",
"require": "./callbacks/manager.d.cts",
"default": "./callbacks/manager.d.ts"
},
"import": "./callbacks/manager.js",
"require": "./callbacks/manager.cjs"
},
That means @langchain/core/callbacks/manager
is a hook path that should work with RITM.
However, in your example you are using @langchain/core/dist/callbacks/manager
, which is NOT an entry-point defined by the module.
I'm doing some more research based on the information you have provided. At the moment, I have only been able to refactor the test to replicate the issue I am actually trying to solve:
test('handles mapped exports: picks up allow listed resolved module', { skip: nodeSupportsExports }, function (t) {
t.plan(3)
let hookHit = false
const hook = new Hook(['@langchain/core/callbacks/manager'], function (exports) {
hookHit = true
return exports
})
const { Tool } = require('@langchain/core/tools')
const MyTool = class MyTool extends Tool {
_call () {
t.pass()
}
}
const tool = new MyTool()
tool.call('foo', [() => { t.pass() }])
t.equal(hookHit, true)
hook.unhook()
})
In short:
@langchain/core/callbacks/manager
(a listed export)@langchain/core/tools
(a listed export)What we will see is that the tools script is ultimately trying to import @langchain/core/dist/callbacks/manager.cjs
. This is shown in the debugger screenshot below.
The "fix" in this PR does not solve this. But before I go deep into solving it, what are your thoughts around this @trentm?
Tracing the chain of loaded modules:
require('@langchain/core/tools')
// resolves via package.json#exports to node_modules/@langchain/core/tools.cjs, which does:
module.exports = require('./dist/tools.cjs');
// resolves to node_modules/@langchain/core/dist/tools.cjs, which does:
const manager_js_1 = require("./callbacks/manager.cjs");
// resolves to node_modules/@langchain/core/dist/callbacks/manager.cjs
DEBUG=require-in-the-middle
debug output for that last "resolves to" (with newlines for some readability):
require-in-the-middle resolved filename "/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/dist/callbacks/manager.cjs" to
module: @langchain/core (
id: ./callbacks/manager.cjs,
resolved: @langchain/core/dist/callbacks/manager.cjs,
basedir: /Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core) +0ms
So, with the current RITM code, you need one of the following Hook args to match:
fullModuleName
): @langchain/core/dist/callbacks/manager.cjs
RITM's matching of an entry point is only looking at the string directly passed to the require(...)
function, the id
var.
So, entry-point matching only currently works for this directly:
require('@langchain/core/callbacks/manager')
and not with whatever internal require()
call that happens to resolve to the same file path that some entry point refers to, e.g. require('./callbacks/manager.cjs');
from .../node_modules/@langchain/core/tools.cjs
in this example.
Thinking about how RITM might implement what you are looking for.
@foo/bar/baz
.id
is a direct match, i.e. require('@foo/bar/baz')
, then call the hook.require(...)
that resolves to some file path under package @foo/bar
, RITM would call require.resolve('@foo/bar/baz')
to see if it resolves to some path, watching out for exceptions:> require.resolve('@langchain/core/callbacks/manager')
'/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/callbacks/manager.cjs'
> require.resolve('@langchain/core/bogus')
Uncaught:
Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './bogus' is not defined by "exports" in /Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/package.json
...
@foo/bar/*
Hook arg it received.require(...)
that resolves to a path inside @foo/bar
, RITM checks to see if the full filename
is one of those resolved paths, and if so, then call the hook (i.e. this is a new match).I think this could work.
One subtle change in behaviour, that likely doesn't practically matter, is that this case (where the Hook arg includes both the entry-point and the package sub-module path) will match with with a different name
value:
Hook(['@langchain/core/callbacks/manager', '@langchain/core/dist/callbacks/manager.cjs'], (name) => {
console.log('Hook: name=%s', name)
})
I can't imagine that'll break anyone. If we're very conservative, that would be a major version bump.
👋 @trentm checking in to see if you've had a chance to consider my finding.
checking in to see if you've had a chance to consider my finding.
@jsumners-nr Did you see my previous comment?
checking in to see if you've had a chance to consider my finding.
@jsumners-nr Did you see my previous comment?
🤦 no. I failed to refresh the page before commenting and for some reason I never got a GitHub notification of an update.
Your idea sounds like a great solution. Would you like me to work on it?
Would you like me to work on it?
If you have the bandwidth, that would be great. It feels like I won't be able to get to this soon.
Would you like me to work on it?
If you have the bandwidth, that would be great. It feels like I won't be able to get to this soon.
Thanks. I'll get it into my queue.
PR #82 originally included code to ignore the resolved file extension. This PR restores that functionality.