Closed JustinLiu97 closed 3 years ago
Hi Justin,
Thank you for interesting at our work. I've seen several issues about similar problem but I didn't meet it in my own server. Maybe it is due to the python version, pytorch version. Sorry I can not help with that.
Hi Justin,
Thank you for interesting at our work. I've seen several issues about similar problem but I didn't meet it in my own server. Maybe it is due to the python version, pytorch version. Sorry I can not help with that.
Thanks for your reply! Btw is it possible for you to share your environment settings? I am using Python 3.5.2, and the same PyTorch version (0.3.1) as requirement.txt, but I am running with CUDA 10.0 and corresponding cudnn 7.6.5 (on RTX 2080 Max Q). Maybe I can try with your CUDA settings and see if the problem still appears? Thanks!
I think this is not caused by CUDA version. Could youi please try to change --pool_size
. This is mostly related to the RAM cost.
I think this is not caused by CUDA version. Could youi please try to change
--pool_size
. This is mostly related to the RAM cost.
I changed the --pool_size for training and it still takes 267GB of RAM (although decreased for almost 30GB). I also noticed that the training and testing needs more than 10 minutes to start, and RAM usage during testing is like maintaining at a low level for some time, then increased to almost 150GB and again stay stable for some time, and then gradually increased to almost 290GB until the end of testing.
@JustinLiu97 https://github.com/TAMU-VITA/EnlightenGAN/blob/982e7a9b62599084ab75fb0a5c1e291d04f88fc3/predict.py#L32 Could you please check this line? I think the webpage will save all images to the buffer until all iterations finish.
@JustinLiu97
Could you please check this line? I think the webpage will save all images to the buffer until all iterations finish.
I commented this line and still get a 290GB+ RAM usage during testing
Actually you need to delete all related lines about webpage
.
Actually you need to delete all related lines about
webpage
.
I have commented every line which has relation with visualizer but still get the same outcome, so maybe it is not caused by visualizer (I am testing on one single image)
Thanks for your feedback. Let me know if you find any way to solve this problem.
Thanks for your feedback. Let me know if you find any way to solve this problem.
Sure. Thank you so much for your advice. I will close this first and update here if I find any solution
Hi TAMU-VITA:
Thank you so much for your impressive work! The results are just amazing.
When I was doing training and testing, I found out that even when testing a single image, the python process will consume a total of almost 300 GB CPU RAM. And also the RAM usage will increase with the increase of epochs during training. The RAM usage increase can be avoided by setting num_workers (n_Threads) to 0, but it will still take almost 300 GB RAM.
I am new to PyTorch, so is there any information on what caused this problem? The size of the model weights seems to be not that large.
Thanks!