cannot run renderer on any other GPU other than GPU 0

Hi Nikos and team,

Thank you for the amazing work! I was trying to run the renderer in a distributed manner, and kept getting illegal memory access errors, so I tried running with CUDA_LAUNCH_BLOCKING=1 to figure out where the error is.

It seems that everything works on GPU 0, but once I switch to other GPUs, I keep getting a SIGSEGV fault. Here is a minimal example of reproducing the error:

def func():
    dev = torch.device(1)                                                                                                           
    mesh = trimesh.load_mesh('/path/to/mesh.obj')
    v, f = mesh.vertices, mesh.faces                       
    v = torch.from_numpy(v).to(dev)[None].repeat(100, 1, 1)                                     
    v = v.float()      
    f = torch.from_numpy(f).to(dev)[None].repeat(100, 1, 1) 
    r = Renderer(camera_mode='look')
    img = r.render_silhouettes(v, f)
    input()
    print(img.shape, img.dtype, img.min(), img.max())

The first point of error happens at rasterize_cuda.forward_face_index_map function, in kernel 1. Initially I thought this is because some intermediate variables inside the Renderer class are defined on GPU0, but changing them also doesn't change anything (I made sure that every tensor that goes into rasterize_cuda.forward_face_index_map has device cuda:1.

Could you tell me what could be happening here? Thanks!

daniilidis-group / neural_renderer

cannot run renderer on any other GPU other than GPU 0 #135