Thank you for the amazing work! I was trying to run the renderer in a distributed manner, and kept getting illegal memory access errors, so I tried running with CUDA_LAUNCH_BLOCKING=1 to figure out where the error is.
It seems that everything works on GPU 0, but once I switch to other GPUs, I keep getting a SIGSEGV fault. Here is a minimal example of reproducing the error:
def func():
dev = torch.device(1)
mesh = trimesh.load_mesh('/path/to/mesh.obj')
v, f = mesh.vertices, mesh.faces
v = torch.from_numpy(v).to(dev)[None].repeat(100, 1, 1)
v = v.float()
f = torch.from_numpy(f).to(dev)[None].repeat(100, 1, 1)
r = Renderer(camera_mode='look')
img = r.render_silhouettes(v, f)
input()
print(img.shape, img.dtype, img.min(), img.max())
The first point of error happens at rasterize_cuda.forward_face_index_map function, in kernel 1. Initially I thought this is because some intermediate variables inside the Renderer class are defined on GPU0, but changing them also doesn't change anything (I made sure that every tensor that goes into rasterize_cuda.forward_face_index_map has device cuda:1.
Could you tell me what could be happening here? Thanks!
Hi Nikos and team,
Thank you for the amazing work! I was trying to run the renderer in a distributed manner, and kept getting
illegal memory access
errors, so I tried running withCUDA_LAUNCH_BLOCKING=1
to figure out where the error is.It seems that everything works on GPU 0, but once I switch to other GPUs, I keep getting a SIGSEGV fault. Here is a minimal example of reproducing the error:
The first point of error happens at
rasterize_cuda.forward_face_index_map
function, in kernel 1. Initially I thought this is because some intermediate variables inside theRenderer
class are defined on GPU0, but changing them also doesn't change anything (I made sure that every tensor that goes intorasterize_cuda.forward_face_index_map
has devicecuda:1
.Could you tell me what could be happening here? Thanks!