Closed VectorYoung closed 5 years ago
@VectorYoung Thanks for your interest!
The width 2.0 is probably too large, since the memory goes quadratically when the width increases.
The second behavior is interesting. Although I have not found this issue in my experiences, I guess potentially this may due to the memory implementation of pytorch. I don't think there is any space for us to debug on that. But if you find out a solution, I would appreciate if you can post here since I am also interested in this question.
Sorry that I could not help on these issues.
Hi @JiahuiYu , I am trying to train US-MobileNet, but scale up to [1.0, 2.0]. However, I get the 'CUDA out of memory' error. During training, the memory varies between 1000MB to 11000MB, but after some iterations, it suddenly got 'CUDA out of memory'. I got the same issue when training US-ResNet. But it is fine with US-MobileNet_[0.25, 1].
One thing weird is that when I fix the 4 width(e.g. [1.0, 1.5, 1.7, 2.0], whatever, just like Slimmable Network), I won't have the memory issue. And the memory is fixed about 4200MB.
I am guessing some tensors or graphs are not freed. But I don't know how to debug it. Do you have the same issue?