Open mush881212 opened 1 year ago
Thanks @mush881212 ,
I suspect OptiX, which is quite sensitive to the driver installed on the machine. Our code uses OptiX 7.3, which requires an Nvidia display driver numbered 465 or higher. Perhaps verify that some standalone OptiX example from the OptiX 7.3 SDK runs fine on that machine.
One alternative may be to use our old code base, https://github.com/NVlabs/nvdiffrec, which is a similar reconstruction pipeline, but without the OptiX ray tracing step.
Note also that the A100 GPU does not have any RT Cores (for ray tracing acceleration), so the ray tracing performance will be lower than what we reported in the paper (we measured on an A6000 RTX GPU).
Hi @jmunkberg,
I think the problem is due to the driver version, because my driver version is too old to support OptiX 7.3. I will try it on another device and upgrade the driver version. Thanks for your help!
Hi, I wonder if the problem is solved? I have the same problem with a driver version 520.61.05 on V100, wonder how to solve it.
Hi @Sheldonmao,
I solved this issue by updating the driver version and using an RTX3090 device instead. Driver version: 465.19.01 CUDA version: 11.3 You could try using these settings.
would relatively high version of GPU hardware be a problem? My GPU is RTX 3090 and driver version is 535.146.02 and I'm getting segmentation fault as the original author
Newer GPUs and drivers shouldn't be an issue, I hope. It has been a while since we released this code, but I just tested on two setups without issues.
Setup 1: Windows Desktop RTX 6000 Ada Gen w/ driver 545.84 PyTorch version: 2.0.0+cu117
Setup 2: Linux Server V100 w/ Driver 515.86.01 Using the Dockerfile from the nvdiffrecmc repo https://github.com/NVlabs/nvdiffrecmc/blob/main/docker/Dockerfile
Hi Team,
Thanks for your amazing work! I try to run the program, but I get the segmentation fault happen when calling DMTetGeometry(FLAGS.dmtet_grid, FLAGS.mesh_scale, FLAGS) in train.py. I tracked down the error and found that it happens when calling ou.OptiXContext() in dmtet.py. I think the error might be happening because of calling _plugin.OptiXStateWrapper(os.path.dirname(file), torch.utils.cpp_extension.CUDA_HOME) in ops.py, but I don't know how to fix it.
I tried to reduce the batch size from 8 to 1 and the image train resolution from 512x512 to 128x128, but the problem persists. Can you give some advice on how to solve this problem?
GPU Hardware:
Nvidia A100 (32G) on server Console error: