dr-pato / audio_visual_speech_enhancement

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
https://dr-pato.github.io/audio_visual_speech_enhancement/
Apache License 2.0
106 stars 25 forks source link

memory leak on gpu #15

Closed khs8727 closed 4 years ago

khs8727 commented 4 years ago

Hi, I got an memory issue(on RAM not VRAM) when running this code on gpu. After training 1-2 epochs (15000 test cases with 128 batchs), it was killed either with segmentation fault error or no error message.

I checked that meric process (SNR SDR stuff) makes growing up cpu utilization and RAM so I removed those, but the memory leak still doesn't disappered.

Any clues why the memory leak happends?

ubuntu 16.04 / tensorflow(gpu) 1.14 (also checked with 1.15) / GTX 1080 Ti

dr-pato commented 4 years ago

Hi @khs8727, a batch size of 128 samples is very large for the models, you don't need to keep it so large. I used 8 batch size at most. So try to decrease it.. Maybe the memory leak due to computational graph growth. If you are able to find it, let me know. Sorry I have no time to check the code now in a short time.

khs8727 commented 4 years ago

Hi @dr-pato I checked that TFRecords module grap system memory while loading data on VRAM and does not release properly. I am newbie in tensorflow, so couldn't figure out which point is actual problem. Instead, I removed all TFRecords part and fed .npy files directly into tensorflow data pipeline. (saving all data in npy instead of TFRecords) Still RAM little growing up while training but RAM usage is much lower and no problem with training. Thanks for your tip!!