facebookarchive / fb.resnet.torch

Torch implementation of ResNet from http://arxiv.org/abs/1512.03385 and training scripts
Other
2.29k stars 664 forks source link

Fix memory leak when saving models #163

Closed davidemaz closed 7 years ago

davidemaz commented 7 years ago

When saving checkpoints the amount of (CPU) RAM memory used increases every time. It seems that the garbage collector doesn't free the unreferenced memory. Copying each Tensor directly to the (CPU) RAM fixed the problem for me. I think issue #109 could be due to this. When the occupied memory grows, the process freezes and the OS kills the process, just like @DmitryUlyanov said. Commit d4f53da saves some more memory. Useful if you have big models and not so much RAM available.

davidemaz commented 7 years ago

I think this memory leak was due to a bug in Cuda 8 prerelease. With the stable version I don't face the problem anymore.