NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.43k stars 157 forks source link

Antialiasing memory leak? #153

Closed odusseys closed 9 months ago

odusseys commented 11 months ago

I am using the following code snippet to render a scene:

def render_untextured(mesh, cam=None, light_dir=None):
  if cam is None or light_dir is None:
    cam, light_dir = random_camera_and_light()
  # Rasterize the mesh.
  vertices_camera = cam.extrinsics.transform(mesh.vertices)

  proj = cam.projection_matrix()[None]
  homogeneous_vecs = kal.render.camera.up_to_homogeneous(
      vertices_camera
  )[..., None]

  vertices_clip = (proj @ homogeneous_vecs).squeeze(-1).contiguous()
  faces = mesh.faces.int().contiguous()
  rast_out = nvdiffrast.torch.rasterize(
      glctx, vertices_clip, faces, (cam.height, cam.width)
  )[0]

  # Compute rasterized normals and apply material with light direction
  normals = mesh.vertex_normals
  vertex_shade = torch.sum(normals * light_dir, dim=2, keepdim=True)

  mask=torch.full((1, normals.shape[1], 1), 1).to("cuda")

  features = torch.cat((vertex_shade, mask), dim=2)
  rendered = nvdiffrast.torch.interpolate(features, rast_out, faces)[0]

  shade = rendered[:,:,:, :1]
  hard_mask = rendered[:,:,:, 1:]

  res = hard_mask * material_color * shade + (1 - hard_mask) * background_color
  res = torch.clamp(res, 0, 1)

  # REMOVING THIS CAUSES MEMORY LEAK
  # res = nvdiffrast.torch.antialias(res.contiguous(), rast_out, vertices_clip, faces, pos_gradient_boost=3)

  return torch.squeeze(res).to("cuda")

Here, cam is a Kaolin camera, but you can assume that everything from proj=... is standard

If I remove the commented line which performs the antialiasing step, I get an instant memory leak. After rendering a couple images, CUDA memory is full, and resetting its cache / performing gc only clears a fraction of the used memory. Removing the antialiasing leads to no noticeable GPU memory usage whatsoever (T4 card).

For reference - I am using

s-laine commented 11 months ago

I don't see a reason why this would leak memory, so there may be something going on outside this function. The antialias op stores a couple of temporary objects for the gradient pass, and if you accidentally hold onto these across iterations, that might explain increasing memory consumption — see the first item here.

However, these objects shouldn't be big enough to immediately fill the GPU memory, unless your mesh has a massive number of triangles (tens or hundreds of millions).

The function above looks fine to me, so hunting this down would require a stand-along reproducer. If this is a bug in the antialiasing op itself, it has to be some sort of a corner case, or the inputs have to be extreme in some way that hasn't come up in other projects that have used this op.