NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.09k stars 222 forks source link

Does the GPU shared memory can be used inside this feature? #19

Closed hzhshok closed 2 years ago

hzhshok commented 2 years ago

Hello, Now i run this featuren under windows11, and i would like to see that the shared memory of the GPU can relex the memory issue, but, still it failed to run for the higher resolution images.

So i would like to know if the shared memory can be used inside this feature? (of course, maybe it needs tiny-cuda-nn to solve???).

Image resolution: 3648x3648. GPU: RTX 3090 24g

{ "ref_mesh": "data/nerf_synthetic/xxxx", "random_textures": true, "iter": 5000, "save_interval": 100, "texture_res": [ 1024, 1024 ], "train_res": [3648, 3648], "batch": 1, "learning_rate": [0.03, 0.01], "ks_min" : [0, 0.1, 0.0], "dmtet_grid" : 64, "mesh_scale" : 1.5, "laplace_scale" : 3000, "display": [{"latlong" : true}, {"bsdf" : "kd"}, {"bsdf" : "ks"}, {"bsdf" : "normal"}], "layers" : 4, "background" : "white", "out_dir": "nerf_xxxx" }

Loading extension module renderutils_plugin... Traceback (most recent call last): File "D:\zhansheng\proj\windows\nvdiffrec\train.py", line 594, in geometry, mat = optimize_mesh(glctx, geometry, mat, lgt, dataset_train, dataset_validate, File "D:\zhansheng\proj\windows\nvdiffrec\train.py", line 415, in optimize_mesh img_loss, reg_loss = trainer(target, it) File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "D:\zhansheng\proj\windows\nvdiffrec\train.py", line 299, in forward return self.geometry.tick(glctx, target, self.light, self.material, self.image_loss_fn, it) File "D:\zhansheng\proj\windows\nvdiffrec\geometry\dmtet.py", line 218, in tick buffers = self.render(glctx, target, lgt, opt_material) File "D:\zhansheng\proj\windows\nvdiffrec\geometry\dmtet.py", line 209, in render return render.render_mesh(glctx, opt_mesh, target['mvp'], target['campos'], lgt, target['resolution'], spp=target['spp'], File "D:\zhansheng\proj\windows\nvdiffrec\render\render.py", line 231, in render_mesh layers += [(render_layer(rast, db, mesh, view_pos, lgt, resolution, spp, msaa, bsdf), rast)] File "D:\zhansheng\proj\windows\nvdiffrec\render\render.py", line 166, in render_layer buffers = shade(gb_pos, gb_geometric_normal, gb_normal, gb_tangent, gb_texc, gb_texc_deriv, File "D:\zhansheng\proj\windows\nvdiffrec\render\render.py", line 46, in shade all_tex = material['kd_ks_normal'].sample(gb_pos) File "D:\zhansheng\proj\windows\nvdiffrec\render\mlptexture.py", line 90, in sample p_enc = self.encoder(_texc.contiguous()) File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\torch\nn\modules\module.py", line 1128, in _call_impl result = forward_call(input, **kwargs) File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\tinycudann\modules.py", line 119, in forward output = _module_function.apply( File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\tinycudann\modules.py", line 31, in forward native_ctx, output = native_tcnn_module.fwd(input, params) RuntimeError: C:/Users/jinshui/AppData/Local/Temp/pip-req-build-ii64hvij/include\tiny-cuda-nn/gpu_memory.h:558 cuMemSetAccess(m_base_address + m_size, n_bytes_to_allocate, &access_desc, 1) failed with error CUDA_ERROR_OUT_OF_MEMORY Could not free memory: C:/Users/jinshui/AppData/Local/Temp/pip-req-build-ii64hvij/include\tiny-cuda-nn/gpu_memory.h:462 cuMemAddressFree(m_base_address, m_max_size) failed with error CUDA_ERROR_INVALID_VALUE

Regards

jmunkberg commented 2 years ago

Hello, I'm not sure what you refer to with shared memory above. We need dedicated GPU memory for most tensors. A training resolution of 3648x3648 is very large, especially on a GPU with 24GB. Furthermore, using four layers will increase the memory almost linearly in the second pass. We usually run at 1k or 2k, and that is with GPUs with 32 or 48 GB. Most often, a somewhat smaller spatial resolution and larger batch size is a better tradeoff for high quality results in our experience.

As discussed elsewhere, there are many ways of decreasing the memory requirements of nvdiffrec (rasterize random crops, switch from differentiable rasterization to differentiable ray casting, and trace a subset of rays), but not something we focused on for the code release (the code is mainly there to reproduce the results from the paper).

hzhshok commented 2 years ago

Thanks @jmunkberg! Got! Actually i also don't confirm about the unified memory that GPU can share/use cpu memory, so i throwed this to check if someone can give hint, you know the GPU 24G already is the Relatively new GPU to the most personals or companies-).

Regards

hzhshok commented 2 years ago

Just clost this issue since the root cause is from hardware limition.