WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.98k stars 524 forks source link

CUDA out of memory after some epochs #199

Open alejoGT1202 opened 2 years ago

alejoGT1202 commented 2 years ago

I'm training on an EC2 instance with T4 GPU and 16GB of memory.

I'm using a batch size of 2 and image size of 960, however after 3 epochs the script is killed because GPU is out of memory. How can I overcome this without reducing my batch size to 1?

Thanks for the help.

mburges-cvl commented 2 years ago

Hi, you can change line:

    https://github.com/WongKinYiu/yolor/blob/be7da6eba2f612a15bf462951d3cdde66755a180/train.py#L219

and line:

    https://github.com/WongKinYiu/yolor/blob/be7da6eba2f612a15bf462951d3cdde66755a180/train.py#L361

not sure why the batch size is doubled during validation, but that solved the issue for me.