Open ghost opened 11 months ago
以前成功过吗,成功过就更新下tokenization_chatglm.py试试 如果第一次运行,很有可能是模型的问题,重新下载模型试一下,
是否解决,遇到一样的问题,AttributeError: 'ChatGLMTokenizer' object has no attribute 'build_prompt'
请问一下我在运行cli_demo.py报如下错误:AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'. Did you mean: '_tokenize'?,我也重新加载了模型还是这样,这是什么情况
请问一下我在运行cli_demo.py报如下错误:AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'. Did you mean: '_tokenize'?,我也重新加载了模型还是这样,这是什么情况
可能是transformers的问题,试一下 pip install transformers==4.33.0
Resolved! The problem occurs because the self.sp_tokenizer
is set after calling super.__init__()
. Specifically, looking at the error information, it is found that super().__init__()
calls the _add_tokens
method in the parent class, which goes on to call the self.get_vocab
method. The get_vocab
method is overridden in the subclass ChatGLMTokenizer
, and self.sp_tokenizer
is used in the subclass's get_vocab
method. However, at this time, self.sp_tokenizer
is not defined.
the solution is set self.sp_tokenizer
before super().__init__()
.
before
class ChatGLMTokenizer(PretrainedTokenizer):
...
def __init__(...) -> None:
super().__init__(...)
...
self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
after
class ChatGLMTokenizer(PretrainedTokenizer):
...
def __init__(...) -> None:
self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
super().__init__(...)
...
# self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
This error does not appear in transformers==4.33.0, but it is reported in the latest version 4.40.2, which is related to the update of the PretrainedTokenizer class.
details in https://zhuanlan.zhihu.com/p/697342575
Is there an existing issue for this?
Current Behavior
训练时候报错如下
求指导
Expected Behavior
No response
Steps To Reproduce
1.windows环境,改写tran.sh如下
2.train.json和dev.json内容如下
Environment
Anything else?
No response