Open icegomic opened 2 years ago
I also encounter the memory explosion issue in p2ch11 when doing validation. My memory size is 32GB. I have no idea what happened.
I have the same issue with the p2ch11 Code. While validating, the memory size explodes. I am also very interested in what causes this ... or why the DataLoader sometimes uses the GPU memory efficiently and sometimes floods the computer memory (RAM?!) beforehand.
Side note: After I had programmed my own architecture of the code for practicing, my memory explodes for the training and validation. I have to tweak the worker and batch size for a good run.
edit maybe this will help: https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/
I use the same config epoch=1, work_num=8, batch_size=32 It works well in p2ch11, the RAM is stable at 6g but when I run the code 'python -m p2ch12.training --balanced' the RAM is very high, and exceeds maximum soon, after that my computer didn't work, and I need to restart it. What happened