dqbd / tiktoken

JS port and JS/WASM bindings for openai/tiktoken
MIT License
705 stars 53 forks source link

Document different package options #48

Closed transitive-bullshit closed 1 day ago

transitive-bullshit commented 1 year ago

It seems there are multiple NPM packages associated with this tiktoken port, and I wasn't able to find the differences clearly documented anywhere. (@dqbd/tiktoken, js-tiktoken, and tiktoken).

Langchainjs seems to be using js-tiktoken (reference and associated commit https://github.com/hwchase17/langchainjs/commit/d60eae5995a0d414f7684fc9c113c91c712c7af0), so I'm going with that for now, but the readme on this project uses tiktoken instead of js-tiktoken, and @dqbd/tiktoken looks like it's still around.

@dqbd would love any clarity you can provide here, and thank you again for your amazing work on this project 🙏

Also, what does the js-tiktoken/lite version actually do differently than the other packages?

dqbd commented 1 year ago

Hello!

I got a little swamped with (school) work recently, so my apologies for the lack of documentation and clarity. I will update the README.md soon, but here are the gist of the changes and the rationale:

This repository maintains two packages.

The reason to port the tiktoken to JS is mainly due to the constraints of edge environments (large WASM bundle, the necessary setup to get WASM working etc.) and toolchain-runtime combinations (#37). The issues are compounded when users are not using the package directly but rather as an dependency of an another library such as LangchainJS (https://github.com/hwchase17/langchainjs/pull/1239).

The plan going forward is to converge the APIs of both libraries to be interchangeable, allowing isomorphic behaviour (#43) and add appropriate documentation soon (with an additional PR for benchmarking both packages). Will close the issue after that is done :)

Hope that clears up!

transitive-bullshit commented 1 year ago

First off, you rock @dqbd 🔥

This makes a ton of sense, and no worries about being swamped w/ school / work. Totally understand and it's all part of open source :)

Thanks for the thorough explanation – will update https://github.com/transitive-bullshit/compare-tokenizers and my other projects accordingly 🙏