PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
40.26k stars 7.45k forks source link

PP-Structure 关键信息抽取--开始训练,使用增值税发票的数据集进行训练,gpu和cpu都报错 #13143

Closed zhongmy20230403 closed 1 week ago

zhongmy20230403 commented 2 weeks ago

问题描述 / Problem Description

使用增值税发票的数据集进行训练,执行报错: 命令:python tools/train.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml 报错: 图片

运行环境 / Runtime Environment

复现代码 / Reproduction Code

python tools/train.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml

完整报错 / Complete Error Message

[2024/06/20 17:12:45] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 2 iterations eval model:: 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "tools/train.py", line 255, in main(config, device, logger, vdl_writer, seed) File "tools/train.py", line 208, in main program.train( File "F:\DIIT\28ocr\src\PaddleOCR\tools\program.py", line 452, in train cur_metric = eval( File "F:\DIIT\28ocr\src\PaddleOCR\tools\program.py", line 651, in eval metric = eval_class.get_metric() File "F:\DIIT\28ocr\src\PaddleOCR\ppocr\metrics\distillation_metric.py", line 60, in get_metric for key in self.metrics: TypeError: 'NoneType' object is not iterable eval model:: 0%|

可能解决方案 / Possible solutions

附件 / Appendix

jingsongliujing commented 2 weeks ago

你检查一下是不是有空标签,空文件

zhongmy20230403 commented 1 week ago

已解决:是因为我数据集的数据量小,eval的batch_size_per_card设置的值超过了我的数据集的数量,相当于只迭代了一次,代码中判断的是>=0的时候,直接break了,两种解决方案: 方案一:将batch_size_per_card设置小一点,最小为数据集的数量+1; 方案二:将代码中的>=改为> image