probable memory leak in cpu ram

david8862 / tf-keras-stacked-hourglass-keypoint-detection

end-to-end Stacked Hourglass Networks pipeline for single-object keypoint estimation, implemented on tf.keras

MIT License

26 stars 11 forks source link

probable memory leak in cpu ram #1

Open gillmac13 opened 4 years ago

gillmac13 commented 4 years ago

Hi David,

I have tried to run a training session, at this point I would like to compare performance between your stacked-hourglass-keypoint-detection solution with another solution I am familiar with in pytorch (deep hrnet).

My dataset is an extension of MPII (same format) with a lot of proprietary images and annotations. So it's easy to run: $ python3 train.py because all defaults are appropriate.

However, on my first trial, the running process was "Killed" at epoch 25/100. The accuracy curve looked OK (0.69 at that point). So I retried, but this time I monitored the CPU and RAM usage. I found out that at each epoch (using "top"), the RAM loses more or less 1 GB of free memory. It is not gradual over the course of an epoch, it's lost as a lump at the beginning of a new epoch. I changed --batch_size to 8, and the leak is about 0.5 GB per epoch.

Have you ever experienced this, and would you have an idea as to how to fix it ? I am running on Ubuntu 18.04 and TF2.1

gillmac13 commented 4 years ago

It goes without saying that I am using 1 GPU which seems to be fully used (up to 100%) during all epochs.

david8862 commented 4 years ago

Hi @gillmac13. Yes, I also noticed the mem leak during training, but haven't got enough time to fix it due to other tasks recently. Will try to make it done ASAP and let you know when finished. Sorry for the trouble.

gillmac13 commented 4 years ago

Many thanks ! I may have a hint towards a solution. When I take "eval_callback" off the list of callbacks, I get a small mem leak during each epoch. When I put it back, the mem leak is much larger (x3) and occurs both during an epoch and during the evaluation.

Evidently some objects in memory are not released when needed. Since I only have 1 worker selected, I thought setting "use_multiprocessing" to True would help release the old objects without other consequences such as data duplication. I do get a cryptic warning about "causing nondeterministic deadlocks", but with 1 worker ?

Good news: launching a training session with "use_multiprocessing=True" works with minimal memory leak and with comparable accuracy results!

gillmac13 commented 4 years ago

I must update the above statement: there is a residual leak despite the multiprocessing scheme, it allowed me to reach epoch 56 (val accuracy of 0.8, as expected), but the process footprint reached 13 GB (started at 5.4 at the 1rst epoch). One solution for training further is to retrain from the lastest saved checkpoint. Trying that...

david8862 commented 4 years ago

I must update the above statement: there is a residual leak despite the multiprocessing scheme, it allowed me to reach epoch 56 (val accuracy of 0.8, as expected), but the process footprint reached 13 GB (started at 5.4 at the 1rst epoch). One solution for training further is to retrain from the lastest saved checkpoint. Trying that...

Hi @gillmac13, many thanks for the info you provided. I've commit a fix for the mem leak. You can pick the latest code and have a try.

gillmac13 commented 4 years ago

Hi @david8862,

Perfect! It works. And Thanks for fixing this promptly. My task is now to evolve "good" models. So far HG2 seems a bit light for my dataset (82% accuracy), some Pytorch implementations of HG2-8 claim accuracies of up to 90% on MPII like here: https://github.com/crockwell/pytorch_stacked_hourglass_cutout or even here (but no code available and it's an evolution of HG) https://openreview.net/pdf?id=HkM3vjCcF7

david8862 commented 4 years ago

Hi @david8862,

Perfect! It works. And Thanks for fixing this promptly. My task is now to evolve "good" models. So far HG2 seems a bit light for my dataset (82% accuracy), some Pytorch implementations of HG2-8 claim accuracies of up to 90% on MPII like here: https://github.com/crockwell/pytorch_stacked_hourglass_cutout or even here (but no code available and it's an evolution of HG) https://openreview.net/pdf?id=HkM3vjCcF7

Hi @gillmac13, many thanks for sharing. The main target of my work is to deploy CNN model to IOT/embedded platform, so I focus more on the lightweight models. But I'll also check these enhancement later and try to pick them if it fit for my platform.