deep-learning-with-pytorch / dlwpt-code

Code for the book Deep Learning with PyTorch by Eli Stevens, Luca Antiga, and Thomas Viehmann.
https://www.manning.com/books/deep-learning-with-pytorch
4.69k stars 1.98k forks source link

p2ch12 the memory explosion when i train the balanced model with the same config in p2ch11 #94

Open icegomic opened 2 years ago

icegomic commented 2 years ago

I use the same config epoch=1, work_num=8, batch_size=32 It works well in p2ch11, the RAM is stable at 6g but when I run the code 'python -m p2ch12.training --balanced' the RAM is very high, and exceeds maximum soon, after that my computer didn't work, and I need to restart it. What happened

Va6lue commented 1 year ago

I also encounter the memory explosion issue in p2ch11 when doing validation. My memory size is 32GB. I have no idea what happened.

donnoc commented 7 months ago

I have the same issue with the p2ch11 Code. While validating, the memory size explodes. I am also very interested in what causes this ... or why the DataLoader sometimes uses the GPU memory efficiently and sometimes floods the computer memory (RAM?!) beforehand.

Side note: After I had programmed my own architecture of the code for practicing, my memory explodes for the training and validation. I have to tweak the worker and batch size for a good run.

edit maybe this will help: https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/