Open louis030195 opened 1 year ago
Hello @louis030195!
I believe it could be possible through exports
field in package.json
, where we could specify different entrypoint for a different engine: https://runtime-keys.proposal.wintercg.org/
I do think it does necessitate an API change, exposing a magic init()
function which will perform the import()
of WASM files when needed (?module
suffix for Vercel etc).
At the moment I'm swarmed with a lot of work, but if you want to take a knack at it, would love to help 😃
Interesting, btw do you know how to unit test different environment? Maybe https://edge-runtime.vercel.app/packages/vm?
ATM found this hack somewhere (probably from you) which works both in node and vercel:
export async function getEncoding(
encoding: TiktokenEncoding,
options?: {
signal?: AbortSignal;
extendedSpecialTokens?: Record<string, number>;
}
) {
if (!(encoding in cache)) {
cache[encoding] =
getFetch()(`https://tiktoken.pages.dev/js/${encoding}.json`, {
signal: options?.signal,
})
.then((res) => res.json())
.catch((e) => {
delete cache[encoding];
throw e;
});
}
const enc = new Tiktoken(await cache[encoding], options?.extendedSpecialTokens);
// // @ts-ignore
// const registry = new FinalizationRegistry((heldValue) => {
// heldValue.free()
// });
// registry.register(enc);
return enc
}
I'm not 100% sure where the .free()
went, bit afraid of memory leaks
Some preliminary work has been done here: https://github.com/dqbd/package-exports-edge, where different dev environments are set up and tested using vitest
with puppeteer
etc.
The snippet was most likely from the recent PR done on LangChain, where I've swapped the WASM bindings with a JS port instead (js-tiktoken
). The rationale was the excessive bundle size and relative complexity to get started with LangchainJS: https://github.com/hwchase17/langchainjs/pull/1239. That is why no .free()
is needed 😄
You might want to consider swapping the implementation as well, considering the trade-offs such as worse performance (around 5x slower, although a PR can reduce the difference to just 2.5x slower).
Regardless, I think the general goal would be to create an isomorphic library, which will load the JS version on edge and WASM in Node.
Hi!
This is a wonderful lib that I use in https://github.com/different-ai/embedbase/tree/main/sdk/embedbase-js
I was wondering if it could be possible to automatically determine the environment and import accordingly (e.g. browser, nextjs, vercel edge, etc)? It would be really great and reduce verbosity. Happy to contribute