THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
https://arxiv.org/abs/2405.14458
GNU Affero General Public License v3.0
9.83k stars 960 forks source link

GPU memory usage issue during training YOLO-S model #361

Open WYL-Projects opened 3 months ago

WYL-Projects commented 3 months ago

Dear author,

I have attempted to use yolov10-s to train on target detection data on NVIDIA GeForce RTX 3090 and NVIDIA A100-SXM4-40GB graphics cards, and I have only trained two categories, but the program occasionally gets an out of memory error, which interrupts the training of the program, and the model training is not stable unless I adjust the batch- size to 16, then the model training is more stable. I don't know why yolov10-s training consumes so much video memory, and it's hard to adjust the batch-size when the video memory is high and low. the details are as follows:

image

Looking forward to the author's reply, thanks!

htwang14 commented 3 months ago

Hi, I run into the same issue. I'm training yolov10-n and the GPU memory grows as the training epoch increases and finally lead to the CUDA OOM error at around the 280-th out of the total 500 epochs. I already removed the training and validation samples that have too much (>=500) objects. Any chance you have already found a solution? Thanks!

arielkantorovich commented 2 months ago

HI, you succussed to get any details on this problem? I have similar problem and I work with NVIDIA RTX 4090

bluesy7585 commented 2 months ago

set workers to 2 helps avoid CUDA out of memory in my case.