THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.08k stars 415 forks source link

微调时代码走到create_dataset_function报错 #202

Open sunxiaoyu12 opened 1 year ago

sunxiaoyu12 commented 1 year ago

微调时报错如下: image

应该是调用sat库的get_tokenizer出错了,有没有办法可以调用本地的tokenizer?

1049451037 commented 1 year ago

把这个仓库的tokenizer相关的文件下载到本地(不需要下载模型权重文件):https://huggingface.co/THUDM/chatglm-6b

然后把这行代码换成下面的代码:https://github.com/THUDM/VisualGLM-6B/blob/main/finetune_visualglm.py#L157

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('your/path/', trust_remote_code=True)

并删掉这行代码:https://github.com/THUDM/VisualGLM-6B/blob/main/finetune_visualglm.py#L181

sunxiaoyu12 commented 1 year ago

把这个仓库的tokenizer相关的文件下载到本地(不需要下载模型权重文件):https://huggingface.co/THUDM/chatglm-6b

然后把这行代码换成下面的代码:https://github.com/THUDM/VisualGLM-6B/blob/main/finetune_visualglm.py#L157

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('your/path/', trust_remote_code=True)

并删掉这行代码:https://github.com/THUDM/VisualGLM-6B/blob/main/finetune_visualglm.py#L181

多谢!可以啦!