PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.18k stars 2.95k forks source link

[Question]: 使用hugging face的配置训练OPT-13b时,报了不支持GPT2Tokenizer的错误 #8279

Closed runzhech closed 6 months ago

runzhech commented 7 months ago

请提出你的问题

https://huggingface.co/facebook/opt-13b/blob/main/tokenizer_config.json 从这个链接获取了OPT-13b的模型参数以及各种配置后调用 python -u -m paddle.distributed.launch --gpus "0,1" finetune_generation.py ./opt/lora_argument.json 进行训练,但是报了不支持GPT2Tokenizer的错误,我检查了PaddleNLP/paddlenlp/transformers/auto/tokenizer.py这个文件,也没有发现opt模型对应的tokenizer,请问我应该用哪个tokernizer进行训练呢? image

w5688414 commented 7 months ago

在配置里面改成这个试试:

https://github.com/PaddlePaddle/PaddleNLP/blob/c3ec984db73cc2b163f89a6ffb3276c9d9d20dd9/paddlenlp/transformers/gpt/tokenizer.py#L304