Closed hzhshok closed 2 years ago
Hello, I'm not sure what you refer to with shared memory above. We need dedicated GPU memory for most tensors. A training resolution of 3648x3648 is very large, especially on a GPU with 24GB. Furthermore, using four layers will increase the memory almost linearly in the second pass. We usually run at 1k or 2k, and that is with GPUs with 32 or 48 GB. Most often, a somewhat smaller spatial resolution and larger batch size is a better tradeoff for high quality results in our experience.
As discussed elsewhere, there are many ways of decreasing the memory requirements of nvdiffrec (rasterize random crops, switch from differentiable rasterization to differentiable ray casting, and trace a subset of rays), but not something we focused on for the code release (the code is mainly there to reproduce the results from the paper).
Thanks @jmunkberg! Got! Actually i also don't confirm about the unified memory that GPU can share/use cpu memory, so i throwed this to check if someone can give hint, you know the GPU 24G already is the Relatively new GPU to the most personals or companies-).
Regards
Just clost this issue since the root cause is from hardware limition.
Hello, Now i run this featuren under windows11, and i would like to see that the shared memory of the GPU can relex the memory issue, but, still it failed to run for the higher resolution images.
So i would like to know if the shared memory can be used inside this feature? (of course, maybe it needs tiny-cuda-nn to solve???).
Image resolution: 3648x3648. GPU: RTX 3090 24g
{ "ref_mesh": "data/nerf_synthetic/xxxx", "random_textures": true, "iter": 5000, "save_interval": 100, "texture_res": [ 1024, 1024 ], "train_res": [3648, 3648], "batch": 1, "learning_rate": [0.03, 0.01], "ks_min" : [0, 0.1, 0.0], "dmtet_grid" : 64, "mesh_scale" : 1.5, "laplace_scale" : 3000, "display": [{"latlong" : true}, {"bsdf" : "kd"}, {"bsdf" : "ks"}, {"bsdf" : "normal"}], "layers" : 4, "background" : "white", "out_dir": "nerf_xxxx" }
Loading extension module renderutils_plugin... Traceback (most recent call last): File "D:\zhansheng\proj\windows\nvdiffrec\train.py", line 594, in
geometry, mat = optimize_mesh(glctx, geometry, mat, lgt, dataset_train, dataset_validate,
File "D:\zhansheng\proj\windows\nvdiffrec\train.py", line 415, in optimize_mesh
img_loss, reg_loss = trainer(target, it)
File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, *kwargs)
File "D:\zhansheng\proj\windows\nvdiffrec\train.py", line 299, in forward
return self.geometry.tick(glctx, target, self.light, self.material, self.image_loss_fn, it)
File "D:\zhansheng\proj\windows\nvdiffrec\geometry\dmtet.py", line 218, in tick
buffers = self.render(glctx, target, lgt, opt_material)
File "D:\zhansheng\proj\windows\nvdiffrec\geometry\dmtet.py", line 209, in render
return render.render_mesh(glctx, opt_mesh, target['mvp'], target['campos'], lgt, target['resolution'], spp=target['spp'],
File "D:\zhansheng\proj\windows\nvdiffrec\render\render.py", line 231, in render_mesh
layers += [(render_layer(rast, db, mesh, view_pos, lgt, resolution, spp, msaa, bsdf), rast)]
File "D:\zhansheng\proj\windows\nvdiffrec\render\render.py", line 166, in render_layer
buffers = shade(gb_pos, gb_geometric_normal, gb_normal, gb_tangent, gb_texc, gb_texc_deriv,
File "D:\zhansheng\proj\windows\nvdiffrec\render\render.py", line 46, in shade
all_tex = material['kd_ks_normal'].sample(gb_pos)
File "D:\zhansheng\proj\windows\nvdiffrec\render\mlptexture.py", line 90, in sample
p_enc = self.encoder(_texc.contiguous())
File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\torch\nn\modules\module.py", line 1128, in _call_impl
result = forward_call(input, **kwargs)
File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\tinycudann\modules.py", line 119, in forward
output = _module_function.apply(
File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\tinycudann\modules.py", line 31, in forward
native_ctx, output = native_tcnn_module.fwd(input, params)
RuntimeError: C:/Users/jinshui/AppData/Local/Temp/pip-req-build-ii64hvij/include\tiny-cuda-nn/gpu_memory.h:558 cuMemSetAccess(m_base_address + m_size, n_bytes_to_allocate, &access_desc, 1) failed with error CUDA_ERROR_OUT_OF_MEMORY
Could not free memory: C:/Users/jinshui/AppData/Local/Temp/pip-req-build-ii64hvij/include\tiny-cuda-nn/gpu_memory.h:462 cuMemAddressFree(m_base_address, m_max_size) failed with error CUDA_ERROR_INVALID_VALUE
Regards