dqbd / tiktoken

JS port and JS/WASM bindings for openai/tiktoken
MIT License
705 stars 53 forks source link

How to import in nodejs server? #32

Closed djaffer closed 1 year ago

djaffer commented 1 year ago

/node_modules/@dqbd/tiktoken/lite/tiktoken_bg.cjs:375 throw new Error(getStringFromWasm0(arg0, arg1)); ^

Error: null pointer passed to rust at module.exports.__wbindgen_throw (/node_modules/@dqbd/tiktoken/lite/tiktoken_bg.cjs:375:11) at wasm://wasm/0030beca:wasm-function[788]:0x70f59 at wasm://wasm/0030beca:wasm-function[786]:0x70f3f at wasm://wasm/0030beca:wasm-function[654]:0x6bdba at wasm://wasm/0030beca:wasm-function[147]:0x477e2 at Tiktoken.encode (/node_modules/@dqbd/tiktoken/lite/tiktoken_bg.cjs:223:18) at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v18.16.0

djaffer commented 1 year ago

nevermind use this package tiktoken-node. You guys should clearly write that this will not work with nodejs without module support making it pointless for backend.

dqbd commented 1 year ago

Hi @djaffer,

Not sure where the issue is, as Node.js is well supported, both for lite and for full-fledged variant. Could you please share a reproducible example?


Full fledged example (tested on v18.16.0)

const { get_encoding } = require("@dqbd/tiktoken");
const encoding = get_encoding("gpt2");
const tokens = encoding.encode("noone is there");
encoding.free();

Lite example (tested on v18.16.0)

const { Tiktoken } = require("@dqbd/tiktoken/lite");
const cl100k_base = require("@dqbd/tiktoken/encoders/cl100k_base.json");

const encoding = new Tiktoken(
  cl100k_base.bpe_ranks,
  cl100k_base.special_tokens,
  cl100k_base.pat_str
);
const tokens = encoding.encode("hello world");
encoding.free();
console.log({ tokens });

main();
djaffer commented 1 year ago

it gave rust null pointer error Uncaught (in promise) Error: null pointer passed to rust

dqbd commented 1 year ago

it gave rust null pointer error Uncaught (in promise) Error: null pointer passed to rust

Could you please share a reproducible example? Thanks

bosunolanrewaju commented 1 year ago

I encountered this same error when I try to encode a string after the encoding has been freed.

const { get_encoding } = require("@dqbd/tiktoken");
const encoding = get_encoding("gpt2");
const tokens = encoding.encode("noone is there");
encoding.free();
const otherTokens = encoding.encode("second encoding"); // <-- this throws Error: null pointer passed to rust
KitsonBroadhurst commented 1 year ago

I had the same issue, moving encoding.free(); to the end, after all of the calls to encoding.encode appeared to solve the issue!

djaffer commented 1 year ago

so the issue is that we have to reinitialize the encoding after freeing. Is that a good practice or we can reuse.

This works

function getTokens(){
const encoding = new Tiktoken(
  cl100k_base.bpe_ranks,
  cl100k_base.special_tokens,
  cl100k_base.pat_str
);
const tokens = encoding.encode("hello world");
encoding.free();
console.log({ tokens });
return tokens;
}

The below does not work. I thought initializing multiple times is not good.

const encoding = new Tiktoken(
  cl100k_base.bpe_ranks,
  cl100k_base.special_tokens,
  cl100k_base.pat_str
);
function getTokens(){

const tokens = encoding.encode("hello world");
encoding.free();
console.log({ tokens });
return tokens;
}
dqbd commented 1 year ago

Multiple initialisation is definitely fine, as seen in your first example. However, in the second example, the encoder is being free'd and then accessed after freeing, which is invalid.

djaffer commented 1 year ago

Thanks. Why is free not automated by refactoring.