botisan-ai / gpt3-tokenizer

Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.
MIT License
171 stars 19 forks source link

Calling tokenizer.encode('toString') errors due to bpe function #17

Open jihan-yin opened 1 year ago

jihan-yin commented 1 year ago
const tokenizer = new GPT3Tokenizer({ type: 'gpt3' });
tokenizer.encode('toString')

will fail. This is because tokenizer.bpe('toString') returns the javascript function toString() instead of an actual string representing the token.

jihan-yin commented 1 year ago

This is because tokenizer.cache['toString'] = {function toString()}. Not sure how this is happening

jihan-yin commented 1 year ago

Currently resolving by adding

// @ts-ignore
tokenizer.cache['toString'] = 'toString';

The bpe still returns the javascript native function instead of token ids though, as tokenizer.encodings['toString'] is set to the function for some reason