dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.45k stars 479 forks source link

ValueError: Tokenizer class ChatGLMTokenizer does not exist or is not currently imported. #607

Open ryzn0518 opened 9 months ago

ryzn0518 commented 9 months ago

Describe the issue as clearly as possible:

run the sample code, then raise the error

Steps/code to reproduce the bug:

import outlines
from pydantic import BaseModel
class AnswerFormat(BaseModel):
    first_name: str
    last_name: str
    year_of_birth: int
    num_seasons_in_nba: int

model = outlines.models.transformers("chatglm3-6b", device="cuda:3")
generator = outlines.generate.json(model, AnswerFormat)
sequence = generator("Please give me information about Michael Jordan.")

Expected result:

AnswerFormat(first_name='Mike', last_name='Jordan', year_of_birth=1963, num_seasons_in_nba=15)

Error message:

<<Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [00:10<00:00,  1.51s/it]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 9
      6     year_of_birth: int
      7     num_seasons_in_nba: int
----> 9 model = outlines.models.transformers("chatglm3-6b", device="cuda:3")
     10 generator = outlines.generate.json(model, AnswerFormat)
     11 sequence = generator("Please give me information about Michael Jordan.")

File ~/.local/lib/python3.10/site-packages/outlines/models/transformers.py:221, in transformers(model_name, device, model_kwargs, tokenizer_kwargs)
    218     model_kwargs["device_map"] = device
    220 model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
--> 221 tokenizer = TransformerTokenizer(model_name, **tokenizer_kwargs)
    223 return Transformer(model, tokenizer)

File ~/.local/lib/python3.10/site-packages/outlines/models/transformers.py:132, in TransformerTokenizer.__init__(self, model_name, **kwargs)
    130 # TODO: Do something to make this hashable?
    131 self.kwargs = kwargs
--> 132 self.tokenizer = AutoTokenizer.from_pretrained(model_name, **kwargs)
    133 self.eos_token_id = self.tokenizer.eos_token_id
    134 self.eos_token = self.tokenizer.eos_token

File ~/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:784, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    782         tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)
    783     if tokenizer_class is None:
--> 784         raise ValueError(
    785             f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
    786         )
    787     return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    789 # Otherwise we have to be creative.
    790 # if model is an encoder decoder, the encoder tokenizer class is used by default

ValueError: Tokenizer class ChatGLMTokenizer does not exist or is not currently imported.>>

Outlines/Python version information:

Version information

``` outlines == 0.0.25 transformers == 4.36.2 ```

Context for the issue:

No response

lapp0 commented 9 months ago

Could you please try outlines.models.transformers("chatglm3-6b", device="cuda:3", trust_remote_code=True)