Optimization faild with GPU memory allocation but running is only taking a half GPU memory

Hello, Thanks for your great work on nvdiffrec!

This feature improved more quality(good PSNR then nvdiffrec), but therunning failed on optimization phase.

So, could you please give me some suggestion about if it is bug or my wrong configuration?
The running GPU cost is noly a half of total GPU memory, but the optimazation step failed to allocate GPU memory resource.

GPU Hardware: RTX 3090(24G)

Image resolution: 2976 X 2976

Configuration file: { "ref_mesh": "data/nerf_synthetic/xxx", "random_textures": true, "iter": 9000, "save_interval": 100, "texture_res": [ 2048, 2048 ], "train_res": [1408, 1408], "batch": 1, "learning_rate": [0.05, 0.003], "dmtet_grid" : 128, "mesh_scale" : 2.5, "validate" : true, "n_samples" : 10, "denoiser" : "bilateral", "laplace_scale" : 3000, "display": [{"latlong" : true}, {"bsdf" : "kd"}, {"bsdf" : "ks"}, {"bsdf" : "normal"}], "background" : "white", "transparency" : true, "out_dir": "nerf_xxx" }

Console error: Running validation MSE, PSNR 0.00557304, 22.581 kd shape torch.Size([1, 2048, 2048, 4]) Cuda path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4 End of OptiXStateWrapper Base mesh has 75052 triangles and 37853 vertices. Avg edge length: 0.023254 OptiXStateWrapper destructor Writing mesh: out/nerf_jianlong\dmtet_mesh/mesh.obj writing 37853 vertices writing 67605 texcoords writing 37853 normals writing 75052 faces Writing material: out/nerf_jianlong\dmtet_mesh/mesh.mtl Done exporting mesh Traceback (most recent call last): File "D:\zhansheng\proj\windows\3d\nvdiffrecmc\train.py", line 665, in geometry, mat = optimize_mesh(denoiser, glctx, glctx_display, geometry, mat, lgt, dataset_train, dataset_validate, FLAGS, File "D:\zhansheng\proj\windows\3d\nvdiffrecmc\train.py", line 424, in optimize_mesh img_loss, reg_loss = geometry.tick(glctx, target, lgt, opt_material, image_loss_fn, it, FLAGS, denoiser) File "D:\zhansheng\proj\windows\3d\nvdiffrecmc\geometry\dlmesh.py", line 65, in tick buffers = render.render_mesh(FLAGS, glctx, opt_mesh, target['mvp'], target['campos'], target['light'] if lgt is None else lgt, target['resolution'], File "D:\zhansheng\proj\windows\3d\nvdiffrecmc\render\render.py", line 327, in render_mesh accum = composite_buffer(key, layers, torch.zeros_like(layers[0][0][key]), True) File "D:\zhansheng\proj\windows\3d\nvdiffrecmc\render\render.py", line 290, in composite_buffer accum = dr.antialias(accum.contiguous(), rast, v_pos_clip, mesh.t_pos_idx.int()) File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\torch\ops.py", line 702, in antialias return _antialias_func.apply(color, rast, pos, tri, topology_hash, pos_gradient_boost) File "C:\Users\jinshui\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\torch\ops.py", line 650, in forward out, work_buffer = _get_plugin().antialias_fwd(color, rast, pos, tri, topology_hash) RuntimeError: CUDA out of memory. Tried to allocate 62.00 MiB (GPU 0; 24.00 GiB total capacity; 19.61 GiB already allocated; 0 bytes free; 20.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Regards Zhansheng

NVlabs / nvdiffrecmc

Optimization faild with GPU memory allocation but running is only taking a half GPU memory #3