THUDM / CodeGeeX2

CodeGeeX2: A More Powerful Multilingual Code Generation Model
https://codegeex.cn
Apache License 2.0
7.6k stars 535 forks source link

AutoTokenizer 加载失败 #150

Closed grapewheel closed 8 months ago

grapewheel commented 8 months ago
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) // 报错
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda').eval()

# remember adding a language tag for better performance
prompt = "# language: Python\n# write a bubble sort function\n"
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=256, top_k=1)
response = tokenizer.decode(outputs[0])

print(response)

报错如下:

[~/.cache/huggingface/modules/transformers_modules/THUDM/codegeex2-6b/3cb3f8fa305c8188c6c997d0be2ccc4b87ba6f7f/tokenization_chatglm.py](https://localhost:8080/#) in vocab_size(self)
    106     @property
    107     def vocab_size(self):
--> 108         return self.tokenizer.n_words
    109 
    110     def get_vocab(self):

AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'
grapewheel commented 8 months ago

原来和chatGLM2一样,transformer必须4.30.2版本,建议在文档中注明