deontologician / openai-api-rust

Rust client for OpenAI API
Apache License 2.0
104 stars 26 forks source link

Convert between text and tokens easily #6

Open deontologician opened 3 years ago

deontologician commented 3 years ago

Currently a couple of the apis talk in tokens, which is inconvenient. It would be nice if you could translate text into tokens and vise-versa easily.

The rust_tokenizer crate has a function called from_file that allows instantiating the GPT2 tokenizer given a couple pretrained tokenizer files. These files are available from huggingface's website here:

There is also an example in rust_bert of constructing a gpt2 tokenizer. Ideally the tokenizer would be built lazily so users of the library don't need to pay for it unless they need the features.

Where to use it

It looks most like this will be useful with the logit_bias feature, since the api requires you send the token number, rather than actual strings. Since the example code is in python, this is a bit of a barrier to users in rust.

deontologician commented 3 years ago

Apparently now there is just https://github.com/huggingface/tokenizers rust tokenizers