Morizeyao / GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.
MIT License
7.45k stars 1.7k forks source link

Error in train.py with input value error #226

Open yiyuexiong opened 3 years ago

yiyuexiong commented 3 years ago

Hi guys. Recently I use ufoym/deepo:all-py36-cu100 images of docker to train example models for testing. With installed the required packages, no matter I use the json training materials of baike or news2016zh, it always give me back the error message that " File "/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py", line 432, in get_input_ids f"Input {text} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."

ValueError: Input {'qid': 'qid_5982723620932473219', 'category': '教育/科学-理工学科-地球科学', 'title': '人站在地球上为什么没有头朝下的感觉 ', 'desc': '', 'answer': '地球上重力作用一直是指向球心的,因此\r\n只要头远离球心,人们就回感到头朝上。'} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers. This value would be changed if I changed the batch_size.

My GPU is RTX6000 with 24G. I've been stucked by this error and tried to find solutions for a weeks, but get nothing. Could anyone please help me fix this issue? Thank you!