Stevenic / vectra

Vectra is a local vector database for Node.js with features similar to pinecone but built using local files.
MIT License
397 stars 32 forks source link

Upgrade from gpt-3-encoder to gpt-tokenizer #45

Open corinagum opened 8 months ago

corinagum commented 8 months ago

https://github.com/niieani/gpt-tokenizer

The above package was originally a fork of gpt-3-encoder but has been upgraded to TypeScript and allows for selecting different encode types (e.g. default is cl100k_base, but if you're using davinci you can switch to cl50k_base).

Teams AI just switched to gpt-tokenizer as well :) 👍🏼

Stevenic commented 6 months ago

Thanks @corinagum... I'm planning to create a whole new collection of tokenizers to help address this.

pelikhan commented 4 months ago

If you would remove the gpt3-tokenizer reference, one could bring any other tokenizer implementation easily.