facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.64k stars 1.29k forks source link

NaN when using MeshRasterizer #561

Open pengsongyou opened 3 years ago

pengsongyou commented 3 years ago

Description

I installed the latest pytorch3d 0.4 and tried to run the fit_textured_mesh tutorial under the Mesh prediction via silhouette rendering section. The loss becomes NaN after around 200 iterations (4 out of 5 times I can reproduce this issue).

I also tried pytorch3d 0.3 (built from source in December), and this issue never happened. Therefore, there might be some issues in the latest update for Mesh Rasterizer.

Reproduce

Install pytorch 1.7.1

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

Install pytorch3d using wheels for linux instruction

pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu102_pyt171/download.html

And then simply run the fit_textured_mesh tutorial, you should be able to reproduce results. I can obtain the nan 4 out 5 times I run.

Best, Songyou

nikhilaravi commented 3 years ago

Thanks @pengsongyou for reporting this issue! We'll look into it asap.

nikhilaravi commented 3 years ago

@pengsongyou I was able to reproduce the error. To resolve the issue in the tutorial add perspective_correct=False in the RasterizationSettings for the rasterizer. In v0.4 we changed this to be automatically inferred from the camera type but there seems to be some instability due to this. We will debug what is happening!

pengsongyou commented 3 years ago

Great, now it indeed seems working, thanks a lot! I have been always using the perspective camera model, but I did not need to turn perspective_correct=False when I was using 0.3 because no issue was found. Just wondering if you could explain why we need to make it explicitly False now in 0.4?

Thanks so much in advance!

Best, Songyou

nikhilaravi commented 3 years ago

@pengsongyou the perspective_correct setting basically ensures that the barycentric coordinates are correct under a perspective camera. This is not corrected in other differentiable renderers like SoftRas/NMR/DIB-R which assume that the perspective effects are small. In the previous version of PyTorch3D this was an optional setting but in the most recent release we decided to set it based on the type of the camera. We will investigate why this is causing nans in the optimization.

JudyYe commented 3 years ago

Hi, I have encountered similar NaN error in rasterizer :/. I just wanna provide another example that might help the team to debug. But as far as right now, perspective_correct=False / Orthogonal camera solves this particular case (Thanks Nikhila and Georgia)

NaN seems to happen when the rendered faces is parallel to the ray. (maybe relevant to the previous issue #110.) I provided my triangle that caused nan fragments in the file: triangle.pkl, together with my script:

    fname = 'triangle.pkl'
    device = 'cuda:0'
    with open(fname, 'rb') as fp:
        obj = pickle.load(fp)
        triangle = obj['tri']
        triangle = triangle.to(device)

    cameras = PerspectiveCameras(100., device=device)
    blend_params = BlendParams(sigma=1e-4, gamma=1e-4)
    dist_eps = 1e-6
    raster_settings = RasterizationSettings(
        image_size=224,
        blur_radius=np.log(1. / dist_eps - 1.) * blend_params.sigma,
        faces_per_pixel=100,
        # perspective_correct=False, # this seems solve the nan error at least for this 
    )
    rasterizer = MeshRasterizer(cameras=cameras, raster_settings=raster_settings).to(device)
    fragments = rasterizer(triangle)
    print(fragments.zbuf.isnan().any() ,fragments.bary_coords.isnan().any())
    # True, True for me

The triangle looks like this in 3D: 3d and this in screen space: 2d visualization code:

from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
import pickle
fname = '/tmp/transfer/vis/triangle.pkl'
with open(fname, 'rb')  as fp:
    triangle = pickle.load(fp)
verts = triangle['verts']
verts2d = triangle['verts_screen']

def refract_verts(verts):
    verts = np.vstack([verts, verts[0:1]])
    return verts
verts = refract_verts(verts)
verts2d = refract_verts(verts2d)

fig = plt.figure()
ax = plt.axes(projection='3d')

ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')

ax.plot3D(verts[:, 0], verts[:, 1], verts[:, 2], 'gray')

fig = plt.figure()
plt.plot(verts2d[:, 0], verts2d[:, 1])
plt.show()

Thanks and good luck.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 5 days with no activity.

nikhilaravi commented 3 years ago

I will look into this issue! Thanks for the explanation @JudyYe.

tals commented 3 years ago

~Hey - I am experiencing the same issue (nans after about 200 iteration steps). perspective_correct=False doesn't seem to help though :(~

EDIT: I didn't notice they were multiple RasterizationSettings instances. Works now!

$ pip list | grep torch
pytorch3d                         0.4.0
torch                             1.7.1+cu110
jbohnslav commented 3 years ago

I can confirm both that this bug still exists in 0.5.0, and that setting perspective_correct=False removes the issue. I ran my code with torch anomaly detection on, not sure if it's helpful. Here's the relevant portion of the anomaly detection output:

  File "/home/jim/Documents/python/pytorch3d/pytorch3d/renderer/mesh/renderer.py", line 59, in forward
    fragments = self.rasterizer(meshes_world, **kwargs)
  File "/home/jim/anaconda3/envs/armo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jim/Documents/python/pytorch3d/pytorch3d/renderer/mesh/rasterizer.py", line 171, in forward
    pix_to_face, zbuf, bary_coords, dists = rasterize_meshes(
  File "/home/jim/Documents/python/pytorch3d/pytorch3d/renderer/mesh/rasterize_meshes.py", line 231, in rasterize_meshes
    pix_to_face, zbuf, barycentric_coords, dists = _RasterizeFaceVerts.apply(
 (function _print_stack)

And the error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_89528/4272499749.py in <module>
     80     optimizer.zero_grad()
---> 81     loss.backward()
     82     optimizer.step()

~/anaconda3/envs/armo/lib/python3.8/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    219                 retain_graph=retain_graph,
    220                 create_graph=create_graph)
--> 221         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    222 
    223     def register_hook(self, hook):

~/anaconda3/envs/armo/lib/python3.8/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    128         retain_graph = create_graph
    129 
--> 130     Variable._execution_engine.run_backward(
    131         tensors, grad_tensors_, retain_graph, create_graph,
    132         allow_unreachable=True)  # allow_unreachable flag

RuntimeError: Function '_RasterizeFaceVertsBackward' returned nan values in its 0th output.
dukleryoni commented 3 years ago

Hi same here on perspective_correct=False fixing the issue. In my settings, I also have that (even when perspective_correct=True) if the mesh and renderer are on on the CPU, I no longer get NaNs.

Additionally, I was also wondering if having FoVPerspectiveCameras camera + perspective_correct=False for the rasterization setting is equivalent to having a weak perspective camera?

rubenverhack commented 3 years ago

I can confirm that this bug is present in v0.5.0 using the out of the box tutorial "camera_position_optimization_with_differentiable_rendering". perspective_correct=False fixes the issue.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 5 days with no activity.

FavorMylikes commented 2 years ago
pytorch                   1.9.1           py3.9_cuda11.1_cudnn8.0.5_0    pytorch
pytorch3d                 0.6.0                    pypi_0    pypi
RasterizationSettings(image_size=(h, w),
                                   blur_radius=0,
                                   faces_per_pixel=1,
                                   perspective_correct=False)
# with
PerspectiveCameras()

This issue still there

by

(render_images != render_images).sum()

to check

bottler commented 2 years ago

@FavorMylikes Yes indeed! The PR was merged since the latest release 0.6.0.

simon-cross commented 2 years ago

I have installed PyTorch3D with the following command: pip3 install "git+https://github.com/facebookresearch/pytorch3d.git" and I still get NaN when running the tutorial fit_textured_mesh.py

bottler commented 2 years ago

@simon-cross Do you still have the output of that command? It's quite likely something's gone wrong and you've ended up with the current release not the latest code. Maybe best to have this conversation on a new issue. A new release 0.6.1 is imminent which should include the fix btw.

simon-cross commented 2 years ago

I have created the following issue related to the NaN problem: issue 991

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

TimmmYang commented 2 years ago

Hello, the NaN problem still exists. In my cases, I use RasterzationSettings as follows:

raster_settings = RasterizationSettings(
            image_size=(self.img_h, self.img_w),
            blur_radius=0,
            faces_per_pixel=1,
            perspective_correct=False,
        )    

My environment:

# Name                    Version                   Build  Channel
pytorch                   1.8.1           py3.7_cuda11.1_cudnn8.0.5_0    pytorch
pytorch3d                 0.6.1                     dev_0    <develop>
github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

d4l3k commented 2 years ago

I'm also running into periodic NaNs w/ the mesh rasterizer. Seems to occur with the HardDepthShader in https://github.com/facebookresearch/pytorch3d/pull/1208 which is about as simple as you can get shading wise

d4l3k commented 2 years ago

I turned on anomaly detection and traced those NaNs back to transform_points denom correction in

https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/renderer/mesh/rasterizer.py#L193-L196

Might be a good idea to change eps so it's not None. Seems to be set in a lot of places so None seems like a bad default given the potential bad behavior. I set it to eps=1e-8 and seems to have solved it. Implicitron looks like it's set to 1e-2 which seems very large

https://github.com/facebookresearch/pytorch3d/blob/7978ffd1e4819d24803b01a1147a2c33ad97c142/pytorch3d/implicitron/tools/point_cloud_utils.py#L73

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

srph25 commented 1 year ago

I encountered a similar problem and I think d4l3k is right. Calling renderer(meshes, eps=1e-8) or similarly for point clouds solved the issue for me.

relh commented 11 months ago

I encountered a similar problem and I think d4l3k is right. Calling renderer(meshes, eps=1e-8) or similarly for point clouds solved the issue for me.

This solved my problem too! Thanks so much, as setting perspective_correct=False didn't do it.