NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.09k stars 222 forks source link

Crash before Second Pass #124

Open half-potato opened 1 year ago

half-potato commented 1 year ago

Not sure what kind of information you need to debug this.

Base mesh has 88958 triangles and 44260 vertices.
Writing mesh:  out/nerf_car/dmtet_mesh/mesh.obj
    writing 44260 vertices
    writing 88260 texcoords
    writing 44260 normals
    writing 88958 faces
Writing material:  out/nerf_car/dmtet_mesh/mesh.mtl
Done exporting mesh
Traceback (most recent call last):
  File "/home/amai/nvdiffrec/train.py", line 625, in <module>
    geometry, mat = optimize_mesh(glctx, geometry, base_mesh.material, lgt, dataset_train, dataset_validate, FLAGS, 
  File "/home/amai/nvdiffrec/train.py", line 420, in optimize_mesh
    img_loss, reg_loss = trainer(target, it)
  File "/home/amai/.conda/envs/31/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/amai/nvdiffrec/train.py", line 304, in forward
    return self.geometry.tick(glctx, target, self.light, self.material, self.image_loss_fn, it)
  File "/home/amai/nvdiffrec/geometry/dlmesh.py", line 68, in tick
    reg_loss = torch.tensor([0], dtype=torch.float32, device="cuda")
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::Error'
  what():  Cuda error: 700[cudaGraphicsUnregisterResource(s.cudaPosBuffer);]
Exception raised from rasterizeReleaseBuffers at /home/amai/.conda/envs/31/lib/python3.10/site-packages/nvdiffrast/common/rasterize.cpp:616 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe5edf57497 in /home/amai/.conda/envs/31/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fe5edf2ec94 in /home/amai/.conda/envs/31/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: rasterizeReleaseBuffers(int, RasterizeGLState&) + 0xe1 (0x7fe56cf53f49 in /home/amai/.cache/torch_extensions/py310_cu116/nvdiffrast_plugin/nvdiffrast_plugin.so)
frame #3: RasterizeGLStateWrapper::~RasterizeGLStateWrapper() + 0x33 (0x7fe56cfacee1 in /home/amai/.cache/torch_extensions/py310_cu116/nvdiffrast_plugin/nvdiffrast_plugin.so)
frame #4: std::default_delete<RasterizeGLStateWrapper>::operator()(RasterizeGLStateWrapper*) const + 0x22 (0x7fe56cf939fa in /home/amai/.cache/torch_extensions/py310_cu116/nvdiffrast_plugin/nvdiffrast_plugin.so)
frame #5: std::unique_ptr<RasterizeGLStateWrapper, std::default_delete<RasterizeGLStateWrapper> >::~unique_ptr() + 0x52 (0x7fe56cf88b62 in /home/amai/.cache/torch_extensions/py310_cu116/nvdiffrast_plugin/nvdiffrast_plugin.so)
frame #6: <unknown function> + 0xc06a5 (0x7fe56cf826a5 in /home/amai/.cache/torch_extensions/py310_cu116/nvdiffrast_plugin/nvdiffrast_plugin.so)
frame #7: <unknown function> + 0x355022 (0x7fe5c4c97022 in /home/amai/.conda/envs/31/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x355eef (0x7fe5c4c97eef in /home/amai/.conda/envs/31/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #19: <unknown function> + 0x23790 (0x7fe604562790 in /usr/lib/libc.so.6)
frame #20: __libc_start_main + 0x8a (0x7fe60456284a in /usr/lib/libc.so.6)

zsh: IOT instruction (core dumped)  python train.py --config configs/nerf_car.json
jmunkberg commented 1 year ago

I would suspect a memory issue. In the second pass, we switch to learning 2D textures so the memory requirement goes up a bit. If you are running near the memory limit, perhaps try decreasing the texture resolution a bit, e.g., using the config flag "texture_res": [ 512, 512 ],, or, if you are running on a GPU with <32GB of memory, perhaps also reduce the batch size.

You can track memory usage by nvidia-smi or nvitop https://github.com/XuehaiPan/nvitop

half-potato commented 1 year ago

I agree. I just realized nvdiffrecmc works with batch size = 6 without crashing. This is for the purpose of benchmarking, so I hope this doesn't decrease accuracy too much.