PaddlePaddle / PaddleYOLO

🚀🚀🚀 YOLO series of PaddlePaddle implementation, PP-YOLOE+, RT-DETR, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv10, YOLOX, YOLOv5u, YOLOv7u, YOLOv6Lite, RTMDet and so on. 🚀🚀🚀
https://github.com/PaddlePaddle/PaddleYOLO
GNU General Public License v3.0
547 stars 133 forks source link

四卡环境下,使用yolov7_l训练voc数据集的时候,GPU占用100%,会卡住,无法继续训练。没有改动任何代码 #178

Closed PHuiC closed 6 months ago

PHuiC commented 1 year ago

问题确认 Search before asking

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

没有进行任何代码上的改动

复现环境 Environment

使用飞桨训练平台训练,四卡

Bug描述确认 Bug description confirmation

是否愿意提交PR? Are you willing to submit a PR?

nemonameless commented 1 year ago

试试将syncbn注释。平台环境原因,加不加syncbn收敛曲线不同但最终精度相近 https://github.com/PaddlePaddle/PaddleYOLO/blob/release/2.6/configs/yolov7/_base_/yolov7_elannet.yml#L2

PHuiC commented 1 year ago

试试将syncbn注释。平台环境原因,加不加syncbn收敛曲线不同但最终精度相近 https://github.com/PaddlePaddle/PaddleYOLO/blob/release/2.6/configs/yolov7/_base_/yolov7_elannet.yml#L2

非常感谢,已经能正常训练了!

kingqiuol commented 1 year ago

试试将syncbn注释。平台环境原因,加不加syncbn收敛曲线不同但最终精度相近 https://github.com/PaddlePaddle/PaddleYOLO/blob/release/2.6/configs/yolov7/_base_/yolov7_elannet.yml#L2

我的没用,我训练的是yolov8,4张显卡v100,模型在训练到一定次数后GPU100%,然后卡着不动了

kaixin-bai commented 11 months ago

试试将syncbn注释。平台环境原因,加不加syncbn收敛曲线不同但最终精度相近 https://github.com/PaddlePaddle/PaddleYOLO/blob/release/2.6/configs/yolov7/_base_/yolov7_elannet.yml#L2

用的yolov5,注释了也没用,还是卡住,GPU多卡使用率全是100%

nemonameless commented 6 months ago

请更新paddle版本和代码再使用试试,显存足够的话尽量不要撑满去使用。谢谢。