PaddlePaddle / PaddleX

All-in-One Development Tool based on PaddlePaddle(飞桨低代码全流程开发工具)
Apache License 2.0
4.78k stars 938 forks source link

paddlex+小度熊数据集进行实例分割不能训练 #1651

Open lightingyang0 opened 1 year ago

lightingyang0 commented 1 year ago

paddlex+小度熊数据集进行实例分割不能训练,请帮助分析,谢谢

硬件条件: 1)笔记本电脑,配有GPU

软件条件: 1)PaddleX版本 2.1.0 2)飞桨版本 2.2.1 3)CUDA 11.2 4)CUDNN 8.1.0

使用数据集: 1)系统自带小度熊数据集

使用参数: 模型 MaskRCNN BackBone Resnet50 使用 FPN 预训练模型 IMAGENET 图像尺寸 756 1008 (也换过默认值) 使用 CPU(GPU也试过,类似的报错) 图像均值 默认 图像方差 默认 迭代论述 12 学习率 0.00012500 批大小 1 高级训练参数: Warm up学习率0.00005(也试过其他数值) Warm up步数 10 学习衰减论述 [8, 11] 数据增强,随机水平翻转开,其余都关

err.log文件内容为: This log file path is C:\paddlex_workspace\projects\P0012\T0045\err.log 注意:标志为WARNING/INFO类的仅为警告或提示类信息,非错误信息 C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\tensor\creation.py:130: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations if data.dtype == np.object: Process Process-1:16: Traceback (most recent call last): File "multiprocessing\process.py", line 297, in _bootstrap File "multiprocessing\process.py", line 99, in run File "paddlexui\pms\model_tasks\tasks.py", line 73, in _call_paddlex_train File "paddlexui\pms\model_tasks\train\detection.py", line 263, in train File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\cv\models\detector.py", line 2226, in train early_stop_patience, use_vdl, resume_checkpoint) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\cv\models\detector.py", line 334, in train use_vdl=use_vdl) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\cv\models\base.py", line 337, in train_loop outputs = self.run(self.net, data, mode='train') File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\cv\models\detector.py", line 105, in run net_out = net(inputs) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\fluid\dygraph\layers.py", line 914, in call outputs = self.forward(*inputs, kwargs) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\ppdet\modeling\architectures\meta_arch.py", line 59, in forward out = self.get_loss() File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\ppdet\modeling\architectures\mask_rcnn.py", line 123, in get_loss bbox_loss, mask_loss, rpn_loss = self._forward() File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\ppdet\modeling\architectures\mask_rcnn.py", line 100, in _forward bbox_targets, bbox_feat) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\fluid\dygraph\layers.py", line 914, in call outputs = self.forward(*inputs, *kwargs) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\ppdet\modeling\heads\mask_head.py", line 246, in forward targets, bbox_feat) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\ppdet\modeling\heads\mask_head.py", line 190, in forward_train mask_feat = self.head(rois_feat) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\fluid\dygraph\layers.py", line 914, in call outputs = self.forward(inputs, kwargs) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddlex\ppdet\modeling\heads\mask_head.py", line 101, in forward return self.upsample(feats) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\fluid\dygraph\layers.py", line 914, in call outputs = self.forward(*inputs, *kwargs) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\fluid\dygraph\container.py", line 98, in forward input = layer(input) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\fluid\dygraph\layers.py", line 914, in call outputs = self.forward(inputs, *kwargs) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\nn\layer\conv.py", line 841, in forward data_format=self._data_format) File "C:\Users\cnhyan\Downloads\PaddleX_GUI_2.1.0_win10\paddle\nn\functional\conv.py", line 1062, in conv2d_transpose pre_bias = getattr(_C_ops, op_type)(x, weight, attrs) SystemError: (Fatal) Operator conv2d_transpose raises an struct paddle::memory::allocation::BadAlloc exception. The exception content is :ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 295.375244MB memory on GPU 0, 4.000000GB memory has been allocated and available memory is only 0.000000B.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model.

    (at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:79) . (at ..\paddle\fluid\imperative\tracer.cc:221)

lailuboy commented 1 year ago

错误看是因为显存或者内存不足,环境上硬件是什么样的,显卡和内存信息也请提供一下

lightingyang0 commented 1 year ago

你好: 内存 NVIDIA TRX A2000 Laptop GPU 显存4G。内存CPU等配置见下截图。


在 2023-01-31 11:37:46,"laibaohua" @.***> 写道:

错误看是因为显存或者内存不足,环境上硬件是什么样的,显卡和内存信息也请提供一下

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>