PaddlePaddle / PaddleX

Low-code development tool based on PaddlePaddle(飞桨低代码开发工具)
Apache License 2.0
4.77k stars 937 forks source link

训练异常终止,请重新开始训练 #1733

Closed Fyee closed 1 year ago

Fyee commented 1 year ago

Checklist:

  1. 查找历史相关issue寻求解答
  2. 翻阅FAQ常见问题汇总和答疑
  3. 确认bug是否在新版本里还未修复
  4. 如果bug是由PaddleX API 2.0导致,且该bug在develop分支里已修复,参考FAQ Q4替换内置PaddleX API

描述问题

数据校验成功,但是执行模型训练时报错

复现

  1. 请提供您出现的报错信息及相关log(log的查找见 FAQ Q2) Signal handlers are set for stagelog cleanup. D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:328: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. self.gallery1 = gr.Gallery( D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:337: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. self.gallery2 = gr.Gallery( D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:346: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. self.gallery3 = gr.Gallery( 文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[] 文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[] 文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[] Running on local URL: http://127.0.0.1:63666 To create a public link, set share=True in launch(). Running on local URL: http://127.0.0.1:55236 To create a public link, set share=True in launch(). Signal handlers are set for stagelog cleanup. D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:328: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. self.gallery1 = gr.Gallery( D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:337: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. self.gallery2 = gr.Gallery( D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\dataset_ui.py:346: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead. self.gallery3 = gr.Gallery( 文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[] 文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[] 文件夹D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output中发现模型文件:[] Running on local URL: http://127.0.0.1:55237 To create a public link, set share=True in launch(). click dataset_varify_btn, start checking dataset, config: model name: picodet_layout_1x, dataset type: COCODetDataset,dataset path: data/example_data/det_layout_examples, max_show_cv: 10 Signal handlers are set for stagelog cleanup. 数据集校验成功 执行: "D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe" "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\run_paddlex.py" --exec_train Signal handlers are set for stagelog cleanup. ['D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe', 'tools/train.py', '--eval', '--config', 'C:\Users\11924\.paddle_uapi\tmpgovj8gy7\detmodel_picodet_layout_1x.yml', '--use_vdl', 'True', '--vdl_log_dir', 'D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output'] Log path: D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output\train.log Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly loading annotations into memory... Done (t=0.03s) creating index... index created! [09/14 16:00:57] ppdet.data.source.coco INFO: Load [90 samples valid, 0 samples invalid] in file D:\5.Software\PaddleX DeskTop\workdir\2293451\1\data\example_data\det_layout_examples\annotations/instance_train.json. W0914 16:00:57.090493 13520 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.2 W0914 16:00:57.104496 13520 gpu_resources.cc:149] device: 0, cuDNN Version: 8.4. [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls0.bias is unmatched with the shape [43] in model head.head_cls0.bias. And the weight head.head_cls0.bias will not be loaded [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls0.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls0.weight. And the weight head.head_cls0.weight will not be loaded [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls1.bias is unmatched with the shape [43] in model head.head_cls1.bias. And the weight head.head_cls1.bias will not be loaded [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls1.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls1.weight. And the weight head.head_cls1.weight will not be loaded [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls2.bias is unmatched with the shape [43] in model head.head_cls2.bias. And the weight head.head_cls2.bias will not be loaded [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls2.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls2.weight. And the weight head.head_cls2.weight will not be loaded [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37] in pretrained weight head.head_cls3.bias is unmatched with the shape [43] in model head.head_cls3.bias. And the weight head.head_cls3.bias will not be loaded [09/14 16:00:58] ppdet.utils.checkpoint INFO: The shape [37, 128, 1, 1] in pretrained weight head.head_cls3.weight is unmatched with the shape [43, 128, 1, 1] in model head.head_cls3.weight. And the weight head.head_cls3.weight will not be loaded [09/14 16:00:59] ppdet.utils.checkpoint INFO: Finish loading model weights: C:\Users\11924/.cache/paddle/weights\picodet_lcnet_x1_0_fgd_layout.pdparams Traceback (most recent call last): File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 209, in main() File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 205, in main run(FLAGS, cfg) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\tools\train.py", line 158, in run trainer.train(FLAGS.eval) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\engine\trainer.py", line 580, in train outputs = model(data) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call return self.forward(*inputs, kwargs) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 60, in forward out = self.get_loss() File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\picodet.py", line 82, in get_loss headouts, = self._forward() File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\architectures\picodet.py", line 66, in _forward fpn_feats = self.neck(body_feats) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call return self.forward(*inputs, *kwargs) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 331, in forward inner_out = self.top_down_blocks[len(self.in_channels) - 1 - idx]( File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call return self.forward(inputs, kwargs) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 213, in forward x_main = self.main_conv(x) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call return self.forward(*inputs, *kwargs) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\uapi\cv_repos\PaddleDetection\ppdet\modeling\necks\csp_pan.py", line 53, in forward x = self.bn(self.conv(x)) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in call return self.forward(inputs, **kwargs) File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\layer\norm.py", line 781, in forward return batch_norm( File "D:\5.Software\PaddleX DeskTop\resources\codelab\lib\site-packages\paddle\nn\functional\norm.py", line 199, in batch_norm batch_normout, , , , , = _C_ops.batch_norm( MemoryError:

    C++ Traceback (most recent call last):

    Not support stack backtrace yet.

    Error Message Summary:

    ResourceExhaustedError: Out of memory error on GPU 0. Cannot allocate 11.132812MB memory on GPU 0, 7.999512GB memory has been allocated and available memory is only 0.000000B. Please check whether there is any other process using GPU 0.

  2. If yes, please stop them, or start PaddlePaddle on another GPU.

  3. If no, please decrease the batch size of your model. (at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:86) Traceback (most recent call last): File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\run_paddlex.py", line 55, in runner.run() File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\base_run_paddlex.py", line 402, in run self.run_train() File "D:\5.Software\PaddleX DeskTop\workdir\2293451\1\base\base_run_paddlex.py", line 233, in run_train self.uapi_model.train( File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 80, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 82, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train File "uapi\cv_uapi\paddledet_uapi\det\model.py", line 93, in uapi.cv_uapi.paddledet_uapi.det.model.DetModel.train File "uapi\cv_uapi\paddledet_uapi\det\runner.py", line 29, in uapi.cv_uapi.paddledet_uapi.det.runner.DetRunner.train File "uapi\base\runner.py", line 343, in uapi.base.runner.BaseRunner.run_cmd uapi.base.utils.errors.CalledProcessError: Command ['D:\5.Software\PaddleX DeskTop\resources\codelab\python.exe', 'tools/train.py', '--eval', '--config', 'C:\Users\11924\.paddle_uapi\tmpgovj8gy7\detmodel_picodet_layout_1x.yml', '--use_vdl', 'True', '--vdl_log_dir', 'D:\5.Software\PaddleX DeskTop\workdir\2293451\1\output'] returned non-zero exit status 1. 训练异常终止,请重新开始训练 Use via API · Built with Gradio

  4. 请提供您使用的GUI版本号 飞桨AI套件 当前版本号:2.1.0

  5. 请提供您使用的操作系统信息,如Linux/Windows/MacOS Windows 11

  6. 请问您使用的CUDA/cuDNN的版本号是? cuda 11.7 cuDNN 8.4.1 显卡GTX 3060TI

Fyee commented 1 year ago

已解决

wyf00747 commented 1 day ago

已解决

请问怎么解决的?我研究了半天。。。。