HengyiWang / Co-SLAM

[CVPR'23] Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM
https://hengyiwang.github.io/projects/CoSLAM.html
Apache License 2.0
415 stars 37 forks source link

cuda out of memory error #7

Closed aartykov closed 1 year ago

aartykov commented 1 year ago

Hello. I am trying to run the code on KITTI-360 dataset. I use Lidar projected images instead of depth maps. I get cuda out of memory error. What should I do?

HengyiWang commented 1 year ago

Hi @aartykov, thank you for reaching out. Generally, our Co-SLAM requires approximately 4GB of GPU memory to run. In order to assist you better, could you please provide the complete error message and the config file you are using? This will help me identify the specific issue you're facing.

aartykov commented 1 year ago

Hey thank you for your quick response. Here is the error message:

Start running... Saving config and script...

kf: 2922

Pixels to save: 19853

SDF resolution: 2500 Use blob Hash size 16 0it [00:00, ?it/s]First frame mapping... coslam.py:179: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). indice_h, indice_w = indice % (self.dataset.H), indice // (self.dataset.H) 0it [00:15, ?it/s] Traceback (most recent call last): File "coslam.py", line 684, in slam.run() File "coslam.py", line 610, in run self.first_frame_mapping(batch, self.config['mapping']['first_iters']) File "coslam.py", line 196, in first_frame_mapping self.save_mesh(0) File "coslam.py", line 587, in save_mesh mesh_savepath=mesh_savepath)
File "/home/elif/miniconda3/envs/coslam/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/elif/Co-SLAM/utils.py", line 80, in extract_mesh flat = query_pts.reshape([-1, 3]).to(bounding_box[:, 0]) RuntimeError: CUDA out of memory. Tried to allocate 22.42 GiB (GPU 0; 23.70 GiB total capacity; 27.72 MiB already allocated; 21.40 GiB free; 58.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

aartykov commented 1 year ago

And here is the config file:

dataset: 'kitti360'

data: datadir: /media/elif/528c46a6-5c9d-4884-b9ef-0e0365014969/kitti_360/ trainskip: 1 downsample: 1 # Cam size downsamle factor sc_factor: 1 # NK ---> Important !!! translation: 0 # not used (not important) num_workers: 4 # Dataloader number of workers output: ./output/kitti-360/ exp_name: demo

mapping: bound: [[-25,25],[-25,25],[-25,25]] # NK ---> Important !!!! marching_cubes_bound: [[-25,25],[-25,25],[-25,25]] # NK ---> Important !!!! sample: 2048 first_mesh: True iters: 20 lr_embed: 0.01 lr_decoder: 0.01 lr_rot: 0.001 lr_trans: 0.001 keyframe_every: 5 map_every: 5 n_pixels: 0.05 first_iters: 1000 optim_cur: True min_pixels_cur: 100 map_accum_step: 1 pose_accum_step: 5 map_wait_step: 0 filter_depth: False

tracking: iter: 10 sample: 1024 pc_samples: 40960 lr_rot: 0.01 lr_trans: 0.01 ignore_edge_W: 20 # NK ignore_edge_H: 20 # NK iter_point: 0 # NK wait_iters: 100 const_speed: True best: False

grid: enc: 'HashGrid' tcnn_encoding: True hash_size: 16 voxel_color: 0.04 voxel_sdf: 0.02 oneGrid: True

pos: enc: 'OneBlob' n_bins: 16

decoder: geo_feat_dim: 15 hidden_dim: 32 num_layers: 2 num_layers_color: 2 hidden_dim_color: 32 tcnn_network: False

cam:
H: 348 #376 NOTE: Size of Lidar projected image W: 1141 #1408 fx: 552.554261 # * 348/376 fy: 552.554261 cx: 682.049453 cy: 238.769549 png_depth_scale: 5000.0 # NK (not known) ---> Important !!! crop_edge: 0 # if >0, crops edges of cam frame near: 0 # NK far: 5 # NK depth_trunc: 5. # NK

training: rgb_weight: 1.0 depth_weight: 0.1 sdf_weight: 5000 fs_weight: 10 eikonal_weight: 0 smooth_weight: 0.00000001 smooth_pts: 64 smooth_vox: 0.04 smooth_margin: 0.

n_samples: 256

n_samples_d: 64 range_d: 0.25 n_range_d: 21 n_importance: 0 perturb: 1 white_bkgd: False trunc: 0.05 rot_rep: 'axis_angle' rgb_missing: 1.0 # Would cause some noisy points around free space, but better completion

mesh: resolution: 512 vis: 500 voxel_eval: 0.05 voxel_final: 0.03 visualisation: False

HengyiWang commented 1 year ago

Hi @aartykov, thank you for providing your error message and the config file. I have fixed the OOM issue in the latest commit. The reason behind the OOM error is that your scene size is quite large, spanning 50m x 50m x 50m, with a voxel size of approximately 5cm. This means that it's not feasible to transfer all query points into the GPU due to memory limitations.

Since the KITTI-360 dataset is relatively large-scale, I recommend using the training and tracking config from the ScanNet dataset. You can increase the hash_size parameter in your scene representation to accommodate the larger scene size. Additionally, adjusting the voxel size for representation and mesh may also be necessary.

I am very curious to see if you can apply Co-SLAM to an outdoor scene like KITTI-360, but please note that further fine-tuning and adjustments will likely be required in addition to the suggestions I've provided above. It may take some experimentation to achieve good results :)

aartykov commented 1 year ago

Thank you for your assistance. Now the training runs, but the loss constantly diverges. Could you, please, comment meaning of the important parameters in tum config file? Thank you.

HengyiWang commented 1 year ago

Hi @aartykov, you can check the documentation here.

Lmy971109 commented 9 months ago

Hi @aartykov ,Have you run Co-SLAM on the kitti data set? After debugging, I found that the camera pose prediction results of kitti data were completely different from the real pose, and there was a scaling problem. How were your config parameters finally set? Looking forward to hearing from you!Thanks

Lmy971109 commented 9 months ago

![Uploading pose_500.png…]()