JiahuiYu / slimmable_networks

Slimmable Networks, AutoSlim, and Beyond, ICLR 2019, and ICCV 2019
Other
914 stars 131 forks source link

Memory Leak when Training US-Net #20

Closed VectorYoung closed 5 years ago

VectorYoung commented 5 years ago

Hi @JiahuiYu , I am trying to train US-MobileNet, but scale up to [1.0, 2.0]. However, I get the 'CUDA out of memory' error. During training, the memory varies between 1000MB to 11000MB, but after some iterations, it suddenly got 'CUDA out of memory'. I got the same issue when training US-ResNet. But it is fine with US-MobileNet_[0.25, 1].

One thing weird is that when I fix the 4 width(e.g. [1.0, 1.5, 1.7, 2.0], whatever, just like Slimmable Network), I won't have the memory issue. And the memory is fixed about 4200MB.

I am guessing some tensors or graphs are not freed. But I don't know how to debug it. Do you have the same issue?

JiahuiYu commented 5 years ago

@VectorYoung Thanks for your interest!

The width 2.0 is probably too large, since the memory goes quadratically when the width increases.

The second behavior is interesting. Although I have not found this issue in my experiences, I guess potentially this may due to the memory implementation of pytorch. I don't think there is any space for us to debug on that. But if you find out a solution, I would appreciate if you can post here since I am also interested in this question.

Sorry that I could not help on these issues.