训练自定义数据集时，出现”places would be ommited when DataLoader is not iterable“

Scharfsinnig commented 3 years ago

基于 PaddleOCR 训练自己的数据集时，出现了错误。不知道咋回事。可否协助看下？

环境：Docker paddlepaddle/paddle:2.0.1-gpu-cuda10.1-cudnn7


Wed Mar 31 17:21:48 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   46C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   41C    P0    15W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+


- 数据：是基于 PaddleLabel 工具标注生成的
- 配置：
- - configs/det/det_db_icdar15_reader.yml

TrainReader: reader_function: ppocr.data.det.dataset_traversal,TrainReader process_function: ppocr.data.det.db_process,DBProcessTrain num_workers: 8 img_set_dir: ./train_data/self_data/train_imgs/ label_file_path: ./train_data/self_data/train_label.txt

EvalReader: reader_function: ppocr.data.det.dataset_traversal,EvalTestReader process_function: ppocr.data.det.db_process,DBProcessTest img_set_dir: ./train_data/self_data/train_imgs/ label_file_path: ./train_data/self_data/train_label.txt test_image_shape: [736, 1280]

TestReader: reader_function: ppocr.data.det.dataset_traversal,EvalTestReader process_function: ppocr.data.det.db_process,DBProcessTest img_set_dir: ./train_data/self_data/test_imgs/ label_file_path: ./train_data/self_data/test_label.txt do_eval: True

其他几个文件的配置修改类似，都是将 Train、Eval、Test 修改成了对应生成的目录。

- 执行指令：`python3.7 -m paddle.distributed.launch --gpus '0,1' tools/train.py -c configs/det/det_r50_vd_db.yml -o Global.pretrain_weights=./pretrain_models/ResNet50_vd_ssld_pretrained`

这里还说下，gpus 参数貌似没用到。下面的运行日志显示默认用了 0 卡。

- 运行结果：
见下

λ qyszh-dl /paddle/PaddleOCR {develop} python3.7 -m paddle.distributed.launch --gpus '0,1' tools/train.py -c configs/det/det_r50_vd_db.yml -o Global.pretrain_weights=./pretrain_models/ResNet50_vd_ssld_pretrained grep: warning: GREP_OPTIONS is deprecated; please use an alias or script /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations def convert_to_list(value, n, name, dtype=np.int): ----------- Configuration Arguments ----------- gpus: 0,1 heter_worker_num: None heter_workers: http_port: None ips: 127.0.0.1 log_dir: log nproc_per_node: None server_num: None servers: training_script: tools/train.py training_script_args: ['-c', 'configs/det/det_r50_vd_db.yml', '-o', 'Global.pretrain_weights=./pretrain_models/ResNet50_vd_ssld_pretrained'] worker_num: None workers:

WARNING 2021-03-31 09:11:34,610 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode launch train in GPU mode INFO 2021-03-31 09:11:34,612 launch_utils.py:471] Local start 2 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:51240 | | PADDLE_TRAINERS_NUM 2 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:51240,127.0.0.1:47012 | | FLAGS_selected_gpus 0 | +=======================================================================================+

INFO 2021-03-31 09:11:34,612 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 grep: warning: GREP_OPTIONS is deprecated; please use an alias or script /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: np.int is a deprecated alias for the builtin int. To silence this warning, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations def convert_to_list(value, n, name, dtype=np.int): 2021-03-31 09:11:35,422-INFO: {'Global': {'debug': False, 'algorithm': 'DB', 'use_gpu': True, 'epoch_num': 1200, 'log_smooth_window': 20, 'print_batch_step': 2, 'save_model_dir': './output/det_db/', 'save_epoch_step': 200, 'eval_batch_step': [5000, 5000], 'train_batch_size_per_card': 8, 'test_batch_size_per_card': 16, 'image_shape': [3, 640, 640], 'reader_yml': './configs/det/det_db_icdar15_reader.yml', 'pretrain_weights': './pretrain_models/ResNet50_vd_ssld_pretrained', 'save_res_path': './output/det_db/predicts_db.txt', 'checkpoints': None, 'save_inference_dir': None, 'infer_img': None}, 'Architecture': {'function': 'ppocr.modeling.architectures.det_model,DetModel'}, 'Backbone': {'function': 'ppocr.modeling.backbones.det_resnet_vd,ResNet', 'layers': 50}, 'Head': {'function': 'ppocr.modeling.heads.det_db_head,DBHead', 'model_name': 'large', 'k': 50, 'inner_channels': 256, 'out_channels': 2}, 'Loss': {'function': 'ppocr.modeling.losses.det_db_loss,DBLoss', 'balance_loss': True, 'main_loss_type': 'DiceLoss', 'alpha': 5, 'beta': 10, 'ohem_ratio': 3}, 'Optimizer': {'function': 'ppocr.optimizer,AdamDecay', 'base_lr': 0.001, 'beta1': 0.9, 'beta2': 0.999}, 'PostProcess': {'function': 'ppocr.postprocess.db_postprocess,DBPostProcess', 'thresh': 0.3, 'box_thresh': 0.7, 'max_candidates': 1000, 'unclip_ratio': 1.5}, 'TrainReader': {'reader_function': 'ppocr.data.det.dataset_traversal,TrainReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTrain', 'num_workers': 8, 'img_set_dir': './train_data/self_data/train_imgs/', 'label_file_path': './train_data/self_data/train_label.txt'}, 'EvalReader': {'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'img_set_dir': './train_data/self_data/train_imgs/', 'label_file_path': './train_data/self_data/train_label.txt', 'test_image_shape': [736, 1280]}, 'TestReader': {'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'img_set_dir': './train_data/self_data/test_imgs/', 'label_file_path': './train_data/self_data/test_label.txt', 'do_eval': True}} 3 640 640 /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /paddle/PaddleOCR/ppocr/modeling/heads/det_db_head.py:123 The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future. op_type, op_type, EXPRESSION_MAP[method_name])) /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /paddle/PaddleOCR/ppocr/modeling/losses/det_basic_loss.py:46 The behavior of expression A B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A B. This transitional warning will be dropped in the future. op_type, op_type, EXPRESSION_MAP[method_name])) /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /paddle/PaddleOCR/ppocr/modeling/losses/det_basic_loss.py:47 The behavior of expression A B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A B. This transitional warning will be dropped in the future. op_type, op_type, EXPRESSION_MAP[method_name])) /usr/local/lib/python3.7/site-packages/paddle/fluid/datafeeder.py:56: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.boolhere. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations np.bool, np.float16, np.float32, np.float64, np.int8, np.int16, /usr/local/lib/python3.7/site-packages/paddle/fluid/framework.py:686: DeprecationWarning:np.boolis a deprecated alias for the builtinbool. To silence this warning, useboolby itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.bool_` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations elif dtype == np.bool: /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /paddle/PaddleOCR/ppocr/modeling/losses/det_basic_loss.py:100 The behavior of expression A B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A B. This transitional warning will be dropped in the future. op_type, op_type, EXPRESSION_MAP[method_name])) /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /paddle/PaddleOCR/ppocr/modeling/losses/det_basic_loss.py:103 The behavior of expression A B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A B. This transitional warning will be dropped in the future. op_type, op_type, EXPRESSION_MAP[method_name])) /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /paddle/PaddleOCR/ppocr/modeling/losses/det_basic_loss.py:113 The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future. op_type, op_type, EXPRESSION_MAP[method_name])) /usr/local/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /paddle/PaddleOCR/ppocr/modeling/losses/det_basic_loss.py:113 The behavior of expression A B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A B. This transitional warning will be dropped in the future. op_type, op_type, EXPRESSION_MAP[method_name])) 2021-03-31 09:11:36,005-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000000] in Optimizer will not take effect, and it will only be applied to other Parameters! 3 640 640 ./train_data/self_data/train_label.txt 2021-03-31 09:11:37,537-INFO: places would be ommited when DataLoader is not iterable W0331 09:11:37.597898 2396 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 10.2 W0331 09:11:37.601858 2396 device_context.cc:372] device: 0, cuDNN Version: 7.6. INFO 2021-03-31 09:11:43,647 launch_utils.py:307] terminate all the procs ERROR 2021-03-31 09:11:43,647 launch_utils.py:545] ABORT!!! Out of all 2 trainers, the trainer process with rank=[1] was aborted. Please check its log. INFO 2021-03-31 09:11:46,650 launch_utils.py:307] terminate all the procs



该问题如何解决？哪里需要修改？

Scharfsinnig commented 3 years ago

@dyning 能否协助看下呢？

WenmuZhou commented 3 years ago

你的训练代码是静态图的，paddle2.0建议配合动态图版本使用

ghost commented 3 years ago

你的训练代码是静态图的，paddle2.0建议配合动态图版本使用

release版本不是动态图的吗

paddle-bot-old[bot] commented 3 years ago

Since you haven\'t replied for more than 3 months, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于您超过三个月未回复，我们将关闭这个issue/pr。若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。

PaddlePaddle / PaddleOCR

训练自定义数据集时，出现”places would be ommited when DataLoader is not iterable“ #2375