Closed petteriTeikari closed 4 years ago
Hi @petteriTeikari ,
Thanks for your experiments with MONAI. I didn't quite understand the "double load" issue you said, how did you detect that it loaded twice? Is it some PyTorch Lightning specific issue?
Thanks.
@Nic-Ma "Double-load" as in when I start training on first epoch the GPU memory usage is ~2.7 GB, and on first validation set, the GPU memory usage goes to ~6.7 GB, and at the initial validation loading the memory use went above 7.2 GB briefly.
Which is not the behavior that I have seen with the standard "MONAI U-Net", which allocates the GPU ram it needs for the model at start?
And actually after overnight training attempt, the ancdata
error, happening between 30-40 epochs of training, and before that the loss seemed to be falling so sorta working
Hi @petteriTeikari ,
I checked your network implementation, unfortunately, I didn't find any explicit memory-related difference between your network and MONAI UNet. Could you please help paste all your program here, maybe our PyTorch Lightning experts @marksgraham and @ericspod can also help take a look at your issue.
Thanks.
@Nic-Ma
I had a bit of look what is going on and actually yes the increase in GPU memory occurs always upon first validation (on first epoch), but similar funky nondeterministic error occurred during training with another error message though "AttributeError: 'Net' object has no attribute 'self' ": https://github.com/petteriTeikari/MONAI_lightning_segmentation/issues/1
I will test these without the PyTorch Lightning part and see if this makes any difference
Is your feature request related to a problem? Please describe. I wanted to try a 3D segmentation net (namely this https://github.com/ozan-oktay/Attention-Gated-Networks/blob/master/models/networks/unet_CT_multi_att_dsv_3D.py) with my existing codebase
by simply replacing this (other parts similar to the to Spleen Pytorch Lightning tutorial https://github.com/Project-MONAI/MONAI/blob/master/examples/notebooks/spleen_segmentation_3d_lightning.ipynb)
with
And it seems to be working (works okay with the Standard Monai U-Net), but the net seems to be loaded again to GPU memory upon validation dataset
-> validation set
Describe the solution you'd like Is there a recommended way (tutorial coming) to use existing networks with Monai when you have the data pipe working for your dataset? Without the "double load"? Where I would release the GPU memory between splits (e.g. https://discuss.pytorch.org/t/how-can-we-release-gpu-memory-cache/14530, https://github.com/PyTorchLightning/pytorch-lightning/issues/458)
Additional context
with the
dataset
transformations
Full network definition: