Memory Leak - Githubissues

ThibaultGROUEIX / AtlasNet

This repository contains the source codes for the paper "AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation ". The network is able to synthesize a mesh (point cloud + connectivity) from a low-resolution point cloud, or from an image.

http://imagine.enpc.fr/~groueixt/atlasnet/

MIT License

671 stars 118 forks source link

Memory Leak #12

Closed liuyuan-pal closed 5 years ago

liuyuan-pal commented 6 years ago

I found that the unused self.dist1 and self.dist2 in the file "nndistance/functions/nnd.py" cause memory leaking in my environment. (Python 3.5.2 with Pytorch 0.4.0)

class NNDFunction(Function):
    def forward(self, xyz1, xyz2):
        dist1,dist2=cuda_compute_from(xyz1,xyz2)
        # following two lines cause memory leak
        self.dist1 = dist1
        self.dist2 = dist2
        return dist1, dist2

    def backward(self, graddist1, graddist2):
        gradxyz1,gradxyz2=grad_cuda_compute_from(graddist1,graddist2)
        return gradxyz1, gradxyz2

liuyuan-pal commented 6 years ago

I know the reason. In Pytorch 0.4.0, it should use

ctx.save_for_backward()

Thank you! AtlasNet is a very nice work~!

ThibaultGROUEIX commented 5 years ago

Indeed @liuyuan-pal ! I updated the code to pytorch v1 and fixed the memory leak with a hugly hack by explicitly detroying the custom chamfer distance layer after the backward call. It seems dirty though. I'll look into your solution, thanks for pointing int out ! Best, Thibault

ThibaultGROUEIX commented 5 years ago

Awesome, really good pointer ! it's fixed! thanks @liuyuan-pal

nghorbani commented 5 years ago

So this memory leak issue is still there. do you have an insight what might be the cause?

ThibaultGROUEIX commented 5 years ago

Hi @nghorbani, Can you share your setup (pytorch, cuda) and the exact command that's failing for you please ? Best, Thibault

nghorbani commented 5 years ago

Hi Thibault, I see that you have tested the code with pytorch 0.4.1, py37and cuda 9.0.176,7.1.2_2. However, my setup is pytorch 1.1.0, py3.7, cuda 10.0, and cudnn-7.5. Could that be version differences? if yes, then have you considered upgrading the code for a recent framework? I am using chamferDist the same way it has been suggested in the tutorials, however i am using LBFGS optimizer of pytorch. Best, Nima

ThibaultGROUEIX commented 5 years ago

Could that be version differences? No... I have tested the code with the latest source. Can you share a minimal code snippet that fails for you please? Cheers, Thibault