elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.32k stars 95 forks source link

Error loading gpt3.5 tokenizer #371

Closed yujonglee closed 5 months ago

yujonglee commented 5 months ago
Bumblebee.load_tokenizer({:hf, "Xenova/gpt-3.5-turbo"})
** (ArgumentError) file not found, url: https://huggingface.co/Xenova/gpt-3.5-turbo/resolve/main/config.json, please specify the :module option
seanmor5 commented 5 months ago

Can you try:

Bumblebee.load_tokenizer({:hf, "Xenova/gpt-3.5-turbo"}, module: GPT2Tokenizer)

?

yujonglee commented 5 months ago

@seanmor5 I tried that, but I got

iex(1)> Bumblebee.load_tokenizer({:hf, "Xenova/gpt-3.5-turbo"}, module: GPT2Tokenizer)
** (ArgumentError) unknown keys [:module] in [module: GPT2Tokenizer], the allowed keys are: [:type]
    (elixir 1.16.2) lib/keyword.ex:359: Keyword.validate!/2
    (bumblebee 0.5.3) lib/bumblebee.ex:896: Bumblebee.load_tokenizer/2
    iex:1: (file)

I am using cb26c2dce95c1c5e7bad4d9ba29115088abfdbe7.

jonatanklosko commented 5 months ago

The API changed, it's no longer a module, try type: :gpt2 : )

yujonglee commented 5 months ago

@jonatanklosko Thanks!

For anyone interested in tokenization in Elixir: https://github.com/fastrepl/fastrepl/blob/da5ab17590f6fa35eccbd9fe7b15b75929eb6655/lib/fastrepl/tokenizer.ex#L10-L19