Closed djaffer closed 1 year ago
nevermind use this package tiktoken-node. You guys should clearly write that this will not work with nodejs without module support making it pointless for backend.
Hi @djaffer,
Not sure where the issue is, as Node.js is well supported, both for lite
and for full-fledged variant. Could you please share a reproducible example?
Full fledged example (tested on v18.16.0)
const { get_encoding } = require("@dqbd/tiktoken");
const encoding = get_encoding("gpt2");
const tokens = encoding.encode("noone is there");
encoding.free();
Lite example (tested on v18.16.0)
const { Tiktoken } = require("@dqbd/tiktoken/lite");
const cl100k_base = require("@dqbd/tiktoken/encoders/cl100k_base.json");
const encoding = new Tiktoken(
cl100k_base.bpe_ranks,
cl100k_base.special_tokens,
cl100k_base.pat_str
);
const tokens = encoding.encode("hello world");
encoding.free();
console.log({ tokens });
main();
it gave rust null pointer error Uncaught (in promise) Error: null pointer passed to rust
it gave rust null pointer error Uncaught (in promise) Error: null pointer passed to rust
Could you please share a reproducible example? Thanks
I encountered this same error when I try to encode a string after the encoding has been freed.
const { get_encoding } = require("@dqbd/tiktoken");
const encoding = get_encoding("gpt2");
const tokens = encoding.encode("noone is there");
encoding.free();
const otherTokens = encoding.encode("second encoding"); // <-- this throws Error: null pointer passed to rust
I had the same issue, moving encoding.free();
to the end, after all of the calls to encoding.encode
appeared to solve the issue!
so the issue is that we have to reinitialize the encoding after freeing. Is that a good practice or we can reuse.
This works
function getTokens(){
const encoding = new Tiktoken(
cl100k_base.bpe_ranks,
cl100k_base.special_tokens,
cl100k_base.pat_str
);
const tokens = encoding.encode("hello world");
encoding.free();
console.log({ tokens });
return tokens;
}
The below does not work. I thought initializing multiple times is not good.
const encoding = new Tiktoken(
cl100k_base.bpe_ranks,
cl100k_base.special_tokens,
cl100k_base.pat_str
);
function getTokens(){
const tokens = encoding.encode("hello world");
encoding.free();
console.log({ tokens });
return tokens;
}
Multiple initialisation is definitely fine, as seen in your first example. However, in the second example, the encoder is being free'd and then accessed after freeing, which is invalid.
Thanks. Why is free not automated by refactoring.
/node_modules/@dqbd/tiktoken/lite/tiktoken_bg.cjs:375 throw new Error(getStringFromWasm0(arg0, arg1)); ^
Error: null pointer passed to rust at module.exports.__wbindgen_throw (/node_modules/@dqbd/tiktoken/lite/tiktoken_bg.cjs:375:11) at wasm://wasm/0030beca:wasm-function[788]:0x70f59 at wasm://wasm/0030beca:wasm-function[786]:0x70f3f at wasm://wasm/0030beca:wasm-function[654]:0x6bdba at wasm://wasm/0030beca:wasm-function[147]:0x477e2 at Tiktoken.encode (/node_modules/@dqbd/tiktoken/lite/tiktoken_bg.cjs:223:18) at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Node.js v18.16.0