JunyuanDeng / NeRF-LOAM

[ICCV2023] NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping
MIT License
491 stars 29 forks source link

torch.cuda.OutOfMemoryError #15

Open iason-r opened 6 months ago

iason-r commented 6 months ago

我使用的是16G的GPU,请问这个报错和数据集大小有关系吗

JunyuanDeng commented 6 months ago

16G应该是够的,您可以尝试降低batch size,最简单的做法就是把这一行的chunksize之间除以10:chunk_size//10。当然,这个会大大降低渲染速度

iason-r commented 6 months ago

请问需要修改的是哪个文件的chunk_size

JunyuanDeng commented 6 months ago

你可以点上面的超链接

iason-r commented 6 months ago

我修改了chunk_size为chunk_size//10之后依然报错 Traceback (most recent call last): File "/home/sucronav/.conda/envs/torch/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/sucronav/.conda/envs/torch/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/sucronav/renbin/NeRF-LOAM/src/mapping.py", line 112, in spin self.do_mapping(share_data, tracked_frame) File "/home/sucronav/renbin/NeRF-LOAM/src/mapping.py", line 179, in do_mapping bundle_adjust_frames( File "/home/sucronav/renbin/NeRF-LOAM/src/variations/render_helpers.py", line 398, in bundle_adjust_frames final_outputs = render_rays( File "/home/sucronav/renbin/NeRF-LOAM/src/variations/render_helpers.py", line 211, in render_rays intersections, hits = ray_intersect( File "/home/sucronav/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/sucronav/renbin/NeRF-LOAM/src/variations/voxel_helpers.py", line 534, in ray_intersect pts_idx, min_depth, max_depth = svo_ray_intersect( File "/home/sucronav/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/sucronav/renbin/NeRF-LOAM/src/variations/voxel_helpers.py", line 108, in forward children = children.expand(S G, children.size()).contiguous() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.61 GiB (GPU 0; 7.79 GiB total capacity; 979.16 MiB already allocated; 2.48 GiB free; 1016.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 程序运行过程中,我每秒刷新一次nvidia-smi 据我观察,gpu内存占用应该是没有达到100%的

iason-r commented 6 months ago

Screenshot from 2024-01-03 16-18-50

JunyuanDeng commented 6 months ago

8G的显存确实有点少了,我之前没有写过把程序放在两张卡上运行过,不知道怎么修改。如果可以的话,尽量用16G以上的显存。

iason-r commented 5 months ago

我使用24g的显存跑,但我跑了接近24个小时,才跑到68%,请问这正常吗 Screenshot from 2024-01-16 09-33-17

JunyuanDeng commented 5 months ago

嗯,目前确实是越跑越慢的,是我们目前的优化方向,您可以使用subscene分支,加快速度。记得fetch新的git更新

iason-r commented 5 months ago

emm,24g显存为什么也报OutOfMemory insert keyframe ** current num kfs: 18 ** tracking frame: 99%|███████████████████████████████████████████████████████████████████████████████████▎| 4501/4540 [47:26:23<44:59, 69.22s/it]Process Process-2: Traceback (most recent call last): File "/home/rb/anaconda3/envs/torch/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/rb/anaconda3/envs/torch/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/rb/NeRF-LOAM/src/mapping.py", line 112, in spin self.do_mapping(share_data, tracked_frame) File "/home/rb/NeRF-LOAM/src/mapping.py", line 179, in do_mapping bundle_adjust_frames( File "/home/rb/NeRF-LOAM/src/variations/render_helpers.py", line 398, in bundle_adjust_frames final_outputs = render_rays( File "/home/rb/NeRF-LOAM/src/variations/render_helpers.py", line 211, in render_rays intersections, hits = ray_intersect( File "/home/rb/anaconda3/envs/torch/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/rb/NeRF-LOAM/src/variations/voxel_helpers.py", line 534, in ray_intersect pts_idx, min_depth, max_depth = svo_ray_intersect( File "/home/rb/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/rb/NeRF-LOAM/src/variations/voxel_helpers.py", line 108, in forward children = children.expand(S G, children.size()).contiguous() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.50 GiB (GPU 0; 23.69 GiB total capacity; 5.14 GiB already allocated; 5.22 GiB free; 7.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

iason-r commented 5 months ago

对了,之前因为 sampled_rays_d = frame.rays_d[sample_mask].cuda()报错RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) 所以我把 sample_mask = frame.sample_mask.cuda() sampled_rays_d = frame.rays_d[sample_mask].cuda() 改成了 sample_mask = frame.sample_mask.cuda() sample_mask = sample_mask.cuda() frame.rays_d = frame.rays_d.cuda() sampled_rays_d = frame.rays_d[sample_mask] 不知道对整体有什么影响吗

JunyuanDeng commented 5 months ago

Tried to allocate 5.50 GiB (GPU 0; 23.69 GiB total capacity; 5.14 GiB already allocated; 5.22 GiB free; 7.45 GiB reserved in total by PyTorch) 5G已经分配,7.5G reserved,理论上24G的卡,你还有11.5G的显存剩余,请问您还运行这别的算法吗,你可以用subscene分支来跑,这样会更快。

iason-r commented 5 months ago

作者您好,我这一段时间一直在尝试运行咱们的包,但是一直卡在OutOfMemory这个问题,在解决这个问题的过程中我想学习一下咱们的代码,请问关于学习我们的代码以及深度学习这方面您有什么建议吗,我今年研一,入学之后一直在做项目,刚开始做科研,之前没有深入了解过深度学习

JunyuanDeng commented 5 months ago

如果没有了解过深度学习,你可以先学习一下Dive-into-DL-PyTorch这本书,中英文版都有。当然你也可以先学习一下机器学习,如果时间赶就不用了。学了基本的深度学习知识后,如果你想知道slam的东西,还得学习一下SLAM14讲这本书,了解基础知识。然后这些看完就有基本的slam和深度学习知识,之后再看看nerf的pytorch实现代码,了解nerf的原理,最后就可以看看有哪些nerf-slam的内容了,比如这个库,找找里面引用和star多的库看看。

iason-r commented 5 months ago

好的,谢谢您了 这两天跑通了,但是跑了几次都跑飞了,感觉定位不太好呀

JunyuanDeng commented 5 months ago

是跑kitti吗?如果是别的场景,可能需要调整一下学习率。

iason-r commented 5 months ago

跑的kitti00

hhongwei1009 commented 2 weeks ago

好的,谢谢您了 这两天跑通了,但是跑了几次都跑飞了,感觉定位不太好呀

请问你是怎么跑通的啊,我也陷入了out of memory