OpenBMB / CPM-Bee

百亿参数的中英文双语基座大模型
2.68k stars 211 forks source link

想微调训练,数据处理的时候就报错,帮忙看看什么问题 #93

Closed hopeforus closed 1 year ago

hopeforus commented 1 year ago

cpmbee) hope@hope-08:~/work/CPM-Bee/src$ python preprocess_dataset.py --input ccpm_example/bee_data --output_path ccpm_example/bin_data --output_name ccpm_data ccpm_example/bee_data/beetry.json: 25%|███████████████▌ | 1/4 [00:00<00:00, 10645.44it/s] Error while writing file Traceback (most recent call last): File "/home/hope/work/CPM-Bee/src/preprocess_dataset.py", line 44, in main() File "/home/hope/work/CPM-Bee/src/preprocess_dataset.py", line 31, in main data = json.loads(line) File "/home/hope/miniconda3/envs/cpmbee/lib/python3.10/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/home/hope/miniconda3/envs/cpmbee/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/hope/miniconda3/envs/cpmbee/lib/python3.10/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 11 (char 10)

hopeforus commented 1 year ago

训练文本格式问题,已经解决