Stevenic / vectra

Vectra is a local vector database for Node.js with features similar to pinecone but built using local files.
MIT License
321 stars 29 forks source link

Upgrade from gpt-3-encoder to gpt-tokenizer #45

Open corinagum opened 4 months ago

corinagum commented 4 months ago

https://github.com/niieani/gpt-tokenizer

The above package was originally a fork of gpt-3-encoder but has been upgraded to TypeScript and allows for selecting different encode types (e.g. default is cl100k_base, but if you're using davinci you can switch to cl50k_base).

Teams AI just switched to gpt-tokenizer as well :) 👍🏼

Stevenic commented 2 months ago

Thanks @corinagum... I'm planning to create a whole new collection of tokenizers to help address this.