Closed xsacha closed 4 years ago
I haven't seen this issue before in similar pytorch training scenarios. I can normally do batch size of 256, but when resuming, I must do 224.
It seems like some memory from loading the resumed model is never freed.
Edit: I resolved the issue by adding in: del(checkpoint)
thx for your contribute!!
I haven't seen this issue before in similar pytorch training scenarios. I can normally do batch size of 256, but when resuming, I must do 224.
It seems like some memory from loading the resumed model is never freed.
Edit: I resolved the issue by adding in: del(checkpoint)