NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.09k stars 222 forks source link

Large gradient at mesh vertices #144

Open Ra1nbowChan opened 9 months ago

Ra1nbowChan commented 9 months ago

Hi! Thanks for your great work! I found that in the training process, the mesh vertex tensor sometimes receives a very large gradient, and the gradient value is usually as the form 2^n (maybe 4^n, i'm not sure) for some integer n To reproduce, just add

        # geometry/dmtet.py L210
        self.mesh_verts = opt_mesh.v_pos
        if self.mesh_verts.requires_grad:
            self.mesh_verts.retain_grad()

        # train.py L443
            if geometry.mesh_verts.grad.max() > 100.:
                import ipdb; ipdb.set_trace()

and run the example command:

python train.py --config configs/bob.json

Then the program will stop at the breakpoint when large gradient occurs. Obtaining such a large gradient will hurt when parametrizing SDF using MLP, since the MLP will collapse after the optimizer step I've tested on Windows 10, MSVC 14.35.32215, and torch2.0+cu11.8/torch1.13.0+cu11.6. I didn't test on cuda11.3 since I didn't find the way to install the corresponding version of tinycudann on Windows currently Any advice? Thanks!

YuxuanSnow commented 2 months ago

It's very interesting observation! I also very often run into the issue that no mesh can be extracted. You can try to add clip_grad_norm, which clips the gradient to a value:

optimizer.zero_grad()        
loss, hidden = model(data, hidden, targets)
loss.backward()

torch.nn.utils.clip_grad_norm(model.parameters(), args.clip)
optimizer.step()

more discussion see https://github.com/pytorch/pytorch/issues/309