Closed zhan4817 closed 4 years ago
Hello @zhan4817, I haven't experimented with your model in detail, so I don't have much to help you with. My only thought is that you might want to check memory usage for things like activations or anything else that needs to be stored for backpropagations. You could do this by looking at the difference between your memory usage under model.train()
vs. model.eval()
.
You also might want to check in on the PyTorch forums.
The u-net model does as few operations at the full resolution as possible, as they are extremely memory intensive. This is a fundamental constraint when designing models for this problem. You need operations that reduce the resolution in order to have more than a small number of channels without running out of memory.
Thank you for your comments!
Thank you for your comments!
Thank you for your comments!
I hope to test the VarNet model in terms of replacing the U-net in each cascade by a shallower residual block (RB), which is composed of Conv2D, ReLu, and cat operations.
From the weight-summary reports, the model with U-net has around 30 million parameters, while the one with RB only has 1-2 million. However, the model with RB seems to consume more GPU memory than the one with U-net (sometimes OOM), which somehow looks strange as I would expect the occupied GPU should largely scale with the parameter size of the network.
I was unable to figure it out for a while, and would like to hear your opinion on this issue. Thank you.