Closed StevenChumak closed 2 years ago
It's running out of memory in the validation stage. Are you using fp16 ? That will save memory. You could try to lower the multi-scale inference a little to --n_scales "0.5,1.0" or --n_scales "0.5,1.0,1.5" which should also help a little.
Thank you! Using boththe fp16 flag and dropping the inference scales down to 0.5 and 1.0 finally allowed me to train the model.
Hi,
I am trying to train a
ocrnet.HRNet_Mscale
model with a custom dataset but I am running out of memory after the first epoch and I can not figure out where my problem lies.I am using 2x RTX 2080 TI with 11GB each. Cuda Version: 10.2 Python: 3.6.9 Pytorch Version: 1.10.0+cu102
My CLI to launch a training:
The Dataloader (used cityscapes dataloader as orientation):
And the error:
Could somebody point me into a direction, on where to look for possible solutions?
Thanks