PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.44k stars 7.66k forks source link

FCENet和DRRG报错:ABORT!!! Out of all 4 trainers, the trainer process with rank=[0, 1, 2, 3] was aborted #9847

Closed Menpinland closed 1 year ago

Menpinland commented 1 year ago

文字检测,相同的数据集 DB、EAST、SAST、PSENet 修改config皆可正常finetune,但在跑FCENet和DRRG时会出错。

错误提示: Traceback (most recent call last): File "tools/train.py", line 208, in main(config, device, logger, vdl_writer) File "tools/train.py", line 183, in main amp_level, amp_custom_black_list) File "/home/xxx/code/PaddleOCR/tools/program.py", line 288, in train preds = model(images, data=batch[1:]) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/parallel.py", line 752, in forward outputs = self._layers(inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "/home/xxx/code/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 93, in forward x = self.neck(x) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "/home/xxx/code/PaddleOCR/ppocr/modeling/necks/fpn_unet.py", line 90, in forward x = paddle.concat([x, c3], axis=1) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/manipulation.py", line 331, in concat return paddle.fluid.layers.concat(input=x, axis=axis, name=name) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/layers/tensor.py", line 343, in concat _C_ops.concat(input, out, 'axis', axis) ValueError: (InvalidArgument) The 2-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [4, 128, 136, 180], input[1]'s shape = [4, 512, 135, 180].

[operator < concat > error] INFO 2023-04-28 10:16:57,206 launch_utils.py:343] terminate all the procs INFO 2023-04-28 10:16:57,206 launch_utils.py:343] terminate all the procs ERROR 2023-04-28 10:16:57,207 launch_utils.py:642] ABORT!!! Out of all 4 trainers, the trainer process with rank=[0, 1, 2, 3] was aborted. Please check its log. ERROR 2023-04-28 10:16:57,207 launch_utils.py:642] ABORT!!! Out of all 4 trainers, the trainer process with rank=[0, 1, 2, 3] was aborted. Please check its log. INFO 2023-04-28 10:17:01,211 launch_utils.py:343] terminate all the procs INFO 2023-04-28 10:17:01,211 launch_utils.py:343] terminate all the procs INFO 2023-04-28 10:17:01,212 launch.py:402] Local processes completed. INFO 2023-04-28 10:17:01,212 launch.py:402] Local processes completed.

config 主要改动: Train: dataset: name: SimpleDataSet data_dir: /home/xxx/datasets/algo/data-small-train label_file_list:

请教一下错误原因

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.