Closed joonas-yoon closed 2 years ago
Hi @joonas-yoon :wave:
Thanks for reporting this! That's an interesting situation, my guess is that the GPU process memory computation has some issues with multi-GPU environments. So I see two things to do:
I'll try to solve this shortly, I may ask you to try the snippet on the upcoming fix branch to check whether this is the source of the problem if you don't mind :)
In the mean time, there are some missing imports & object instantiation in your snippet, could you make it fully executable please? (mostly interested in the model instantiation and whether you moved it to one of your GPUs)
Hi @frgfm
I got exactly same problem on kaggle notebook having 1 GPU.
here is the link, you can see the output: https://www.kaggle.com/joonasyoon/wgan-cp-with-celeba-and-lsun-dataset
When there is two model and summary them, first one make zero RAM usage, and second one has negative.
Alright, I think I found a solution in #64 :+1: @joonas-yoon would you mind trying to install the "negative-ram" branch and check whether that solves your problem?
Side note: what you experienced in Kaggle (that the second model will have a different RAM value) won't be fixed by this as GPU RAM usage is reported as blended between all objects.
oh! that is good news. I will install in directly to check it :)
and thanks for information note.
Bug description
I have been following DCGAN Tutorial with PyTorch, and ran in my jupyter environment.
I tried to show summary ant I got the result with negative RAM usage like
Framework & CUDA overhead: -390.33 Mb
I have 4 GPUs but use only 1 GPU for this script by config:
here is the define of model.
Code snippet to reproduce the bug
the result of
summary(netG, (nz, 1, 1))
:and this is result for comparing with other module (torchsummary)
the result of
torchsummary.summary(netG, (nz, 1, 1))
:Error traceback
No error message.
Environment
I would prefer not to share environment details, sorry for having problem with security agreement.