PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
39.75k stars 7.38k forks source link

文字检测训练总是报错,检查数据文件没有问题 #12081

Closed Ghaz1i closed 4 weeks ago

Ghaz1i commented 1 month ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\14.jpg [{"transcription": "0028.0", "points": [[467, 540], [1062, 546], [1061, 706], [466, 699]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\07.jpg [{"transcription": "0010.0", "points": [[500, 343], [1008, 339], [1009, 478], [501, 482]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\30.jpg [{"transcription": "0040.0", "points": [[579, 327], [1091, 318], [1093, 458], [581, 466]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\23.jpg [{"transcription": "0037.0", "points": [[505, 216], [1156, 216], [1156, 402], [505, 402]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\02.jpg [{"transcription": "0027.0", "points": [[521, 233], [1059, 242], [1057, 389], [519, 380]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\18.jpg [{"transcription": "0017.0", "points": [[457, 199], [1109, 207], [1106, 405], [455, 396]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:09] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\14.jpg [{"transcription": "0028.0", "points": [[467, 540], [1062, 546], [1061, 706], [466, 699]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:30] ppocr INFO: epoch: [1/1200], global_step: 2, lr: 0.000014, dml_thrink_maps_0: 0.901465, loss: 20.252010, DBLoss_Student_loss_shrink_maps: 4.710961, DBLoss_Student_loss_threshold_maps: 4.071359, DBLoss_Student_loss_binary_maps: 0.931676, DBLoss_Student_loss_cbn: 0.000000, DBLoss_Student2_loss_shrink_maps: 4.662602, DBLoss_Student2_loss_threshold_maps: 4.037912, DBLoss_Student2_loss_binary_maps: 0.936035, DBLoss_Student2_loss_cbn: 0.000000, avg_reader_cost: 0.19312 s, avg_batch_cost: 23.10311 s, avg_samples: 2.0, ips: 0.08657 samples/s, eta: 5 days, 18:36:20 [2024/05/09 14:51:30] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\02.jpg [{"transcription": "0027.0", "points": [[521, 233], [1059, 242], [1057, 389], [519, 380]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\20.jpg [{"transcription": "0027.0", "points": [[491, 238], [1128, 249], [1126, 429], [488, 418]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:52:16] ppocr INFO: epoch: [1/1200], global_step: 4, lr: 0.000042, dml_thrink_maps_0: 0.856502, loss: 20.086422, DBLoss_Student_loss_shrink_maps: 4.733825, DBLoss_Student_loss_threshold_maps: 3.983587, DBLoss_Student_loss_binary_maps: 0.940934, DBLoss_Student_loss_cbn: 0.000000, DBLoss_Student2_loss_shrink_maps: 4.737186, DBLoss_Student2_loss_threshold_maps: 3.955768, DBLoss_Student2_loss_binary_maps: 0.948461, DBLoss_Student2_loss_cbn: 0.000000, avg_reader_cost: 0.00000 s, avg_batch_cost: 22.83796 s, avg_samples: 2.0, ips: 0.08757 samples/s, eta: 5 days, 17:47:51 [2024/05/09 14:52:16] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem label = substr[1] IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\09.jpg [{"transcription": "0002.0", "points": [[456, 455], [1073, 455], [1073, 622], [456, 622]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\10.jpg [{"transcription": "0008.0", "points": [[438, 431], [1034, 431], [1034, 596], [438, 596]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\03.jpg [{"transcription": "0007.0", "points": [[612, 379], [1087, 379], [1087, 508], [612, 508]], "difficult": false}] , error happened with msg: Traceback (most recent call last): File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem data["ext_data"] = self.get_ext_data() File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data label = substr[1] IndexError: list index out of range

UserWangZz commented 1 month ago

G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\03.jpg [{"transcription": "0007.0", "points": [[612, 379], [1087, 379], [1087, 508], [612, 508]], "difficult": false}] 检查一下图像路径和label之间的分割符是不是\t

lili-changjiang commented 1 month ago

你看下epoch次数是多少,我epoch次数少就会报这个错,我设置为500就不会

wentao-uw commented 1 month ago

训练数据的格式可能有问题:[{"transcription": "0010.0", "points": [[500, 343], [1008, 339], [1009, 478], [501, 482]], "difficult": false}],应该是image_path \t [{"label": "xxx", "transcription": "xxx", "points": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]}]

564142183 commented 1 month ago

@wentao-uw 大佬,训练的数据格式加了\t,还是报错

[2024/05/13 00:49:51] ppocr ERROR: When parsing line /PaddleOCR/train_data/det/train/0009.jpg \t [{"transcription": "熊杼", "points": [[95, 55], [139, 55], [139, 80], [95, 80]], "difficult": false, "key_cls": "name"}, {"transcription": "性别机器人民族汉", "points": [[45, 88], [215, 88], [215, 109], [45, 109]], "difficult": false, "key_cls": "sex"}, {"transcription": "2013年4月5日", "points": [[89, 116], [244, 116], [244, 141], [89, 141]], "difficult": false, "key_cls": "birth"}, {"transcription": "广东省广州某某666号—自制数据集", "points": [[87, 149], [279, 149], [279, 197], [87, 197]], "difficult": false, "key_cls": "adress"}, {"transcription": "661546442301175555", "points": [[145, 236], [312, 236], [312, 258], [145, 258]], "difficult": false, "key_cls": "card"}]
, error happened with msg: Traceback (most recent call last):
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 153, in __getitem__
    label = substr[1]
IndexError: list index out of range

Exception in thread Thread-2 (_thread_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 603, in _thread_loop
    batch = self._get_data()
Traceback (most recent call last):
  File "/PaddleOCR/tools/train.py", line 255, in <module>
    main(config, device, logger, vdl_writer, seed)
  File "/PaddleOCR/tools/train.py", line 208, in main
    program.train(
  File "/PaddleOCR/tools/program.py", line 304, in train
    for idx, batch in enumerate(train_dataloader):
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
    self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at ../paddle/fluid/operators/reader/blocking_queue.h:171)

  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
    batch.reraise()
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 187, in reraise
    raise self.exc_type(msg)
RecursionError: DataLoader worker(0) caught RecursionError with message:
Traceback (most recent call last):
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 153, in __getitem__
    label = substr[1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
    batch = fetcher.fetch(indices)
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/fetcher.py", line 77, in fetch
    data.append(self.dataset[idx])
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 177, in __getitem__
    return self.__getitem__(rnd_idx)
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 177, in __getitem__
    return self.__getitem__(rnd_idx)
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 177, in __getitem__
    return self.__getitem__(rnd_idx)
  [Previous line repeated 967 more times]
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 164, in __getitem__
    self.logger.error(
  File "/usr/lib/python3.10/logging/__init__.py", line 1506, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.emit(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1218, in emit
    StreamHandler.emit(self, record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1100, in emit
    msg = self.format(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 943, in format
    return fmt.format(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 679, in format
    if self.usesTime():
  File "/usr/lib/python3.10/logging/__init__.py", line 647, in usesTime
    return self._style.usesTime()
  File "/usr/lib/python3.10/logging/__init__.py", line 424, in usesTime
    return self._fmt.find(self.asctime_search) >= 0
RecursionError: maximum recursion depth exceeded while calling a Python object
UserWangZz commented 1 month ago

/PaddleOCR/train_data/det/train/0009.jpg \t [{"transcription": "熊杼", "points": [[95, 55], [139, 55], [139, 80], [95, 80]], "difficult": false, "key_cls": "name"}, {"transcription": "性别机器人民族汉", "points": [[45, 88], [215, 88], [215, 109], [45, 109]], "difficult": false, "key_cls": "sex"}, {"transcription": "2013年4月5日", "points": [[89, 116], [244, 116], [244, 141], [89, 141]], "difficult": false, "key_cls": "birth"}, {"transcription": "广东省广州某某666号—自制数据集", "points": [[87, 149], [279, 149], [279, 197], [87, 197]], "difficult": false, "key_cls": "adress"}, {"transcription": "661546442301175555", "points": [[145, 236], [312, 236], [312, 258], [145, 258]], "difficult": false, "key_cls": "card"}] 这个\t是字符还是制表符,检查一下,确保是地址+制表符+label

564142183 commented 1 month ago

@UserWangZz 感谢回复,我是用PPOCRLabel划分的数据集,命令是

python PPOCRLabel/gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath train_data --detRootPath train_data/det --recRootPath train_data/rec

划分好的训练train.txt格式如下,这种格式是带制表符的吧?

/PaddleOCR/train_data/det/train/0009.jpg    [{"transcription": "熊杼", "points": [[95, 55], [139, 55], [139, 80], [95, 80]], "difficult": false, "key_cls": "name"}, {"transcription": "性别机器人民族汉", "points": [[45, 88], [215, 88], [215, 109], [45, 109]], "difficult": false, "key_cls": "sex"}, {"transcription": "2013年4月5日", "points": [[89, 116], [244, 116], [244, 141], [89, 141]], "difficult": false, "key_cls": "birth"}, {"transcription": "广东省广州某某666号—自制数据集", "points": [[87, 149], [279, 149], [279, 197], [87, 197]], "difficult": false, "key_cls": "adress"}, {"transcription": "661546442301175555", "points": [[145, 236], [312, 236], [312, 258], [145, 258]], "difficult": false, "key_cls": "card"}]

但训练的时候一直报错

[2024/05/13 02:07:25] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 1000 iterations
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
[2024-05-13 02:18:01,015] [ WARNING] dataloader_iter.py:707 - DataLoader 1 workers exit unexpectedly, pids: 9374
Traceback (most recent call last):
  File "/PaddleOCR/tools/train.py", line 255, in <module>
    main(config, device, logger, vdl_writer, seed)
  File "/PaddleOCR/tools/train.py", line 208, in main
    program.train(
  File "/PaddleOCR/tools/program.py", line 304, in train
    for idx, batch in enumerate(train_dataloader):
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
    self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at ../paddle/fluid/operators/reader/blocking_queue.h:171)
UserWangZz commented 1 month ago

我好像遇到过你这个问题,是不是在docker中训练的呢? ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough

应该是创建docker的时候,没有指定--shm-size,根据你的服务器配置,在docker创建时添加--shm-size {共享存储大小}G,或者直接指定--ipc=host

也可以先在配置文件中设置,尝试一下

loader:
    shuffle: true
    batch_size_per_card: 96
    drop_last: true
    num_workers: 8
    use_shared_memory=false
564142183 commented 4 weeks ago

@UserWangZz 感谢大佬,docker启动加--ipc=host解决了

Ghaz1i commented 3 weeks ago

G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\03.jpg [{"transcription": "0007.0", "points": [[612, 379], [1087, 379], [1087, 508], [612, 508]], "difficult": false}] 检查一下图像路径和label之间的分割符是不是\t

您好,我确定分隔符是\t,然后路径中有时候把/换成\就可以,但有时候又不行