NVlabs / BundleSDF

[CVPR 2023] BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
https://bundlesdf.github.io/
Other
998 stars 110 forks source link

numpy.core._exceptions.MemoryError: Unable to allocate large memory #104

Closed mqcmd196 closed 4 months ago

mqcmd196 commented 10 months ago

Hi. I've followed your instructions and tried to execute run_custom.py

I executed

(py38) root@spada:/home/obinata/Programs/BundleSDF/BundleSDF# python run_custom.py --mode run_video --video_dir /home/obinata/Programs/BundleSDF/original_dataset/ --out_folder /home/obinata/Programs/BundleSDF/bundlesdf_original_dataset/ --use_segmenter 1 --use_gui 0 --debug_level 3

then the program outputs

[2023-10-24 01:45:28.470] [warning] [Bundler.cpp:49] Connected to nerf_port 9999
[2023-10-24 01:45:28.470] [warning] [FeatureManager.cpp:2084] Connected to port 5555
default_cfg {'backbone_type': 'ResNetFPN', 'resolution': (8, 2), 'fine_window_size': 5, 'fine_concat_coarse_feat': True, 'resnetfpn': {'initial_dim': 128, 'block_dims': [128, 196, 256]}, 'coarse': {'d_model': 256, 'd_ffn': 256, 'nhead': 8, 'layer_names': ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross'], 'attention': 'linear', 'temp_bug_fix': False}, 'match_coarse': {'thr': 0.2, 'border_rm': 2, 'match_type': 'dual_softmax', 'dsmax_temperature': 0.1, 'skh_iters': 3, 'skh_init_bin_score': 1.0, 'skh_prefilter': True, 'train_coarse_percent': 0.4, 'train_pad_num_gt_min': 200}, 'fine': {'d_model': 128, 'd_ffn': 128, 'nhead': 8, 'layer_names': ['self', 'cross'], 'attention': 'linear'}}
[bundlesdf.py] last_stamp 1698120894729592800
[bundlesdf.py] keyframes#: 1
[tool.py] compute_scene_bounds_worker start
[tool.py] compute_scene_bounds_worker done
[tool.py] merge pcd
[tool.py] compute_translation_scales done
translation_cvcam=[ 0.11020941  0.52164938 -2.79947506], sc_factor=0.30735080368031126
[nerf_runner.py] Octree voxel dilate_radius:1
level 0, resolution: 16
level 1, resolution: 20
level 2, resolution: 24
level 3, resolution: 28
level 4, resolution: 34
level 5, resolution: 41
level 6, resolution: 49
level 7, resolution: 59
level 8, resolution: 71
level 9, resolution: 85
level 10, resolution: 102
level 11, resolution: 123
level 12, resolution: 148
level 13, resolution: 177
level 14, resolution: 213
level 15, resolution: 256
GridEncoder: input_dim=3 n_levels=16 level_dim=2 resolution=16 -> 256 per_level_scale=1.2030 params=(20411696, 2) gridtype=hash align_corners=False
sc_factor 0.30735080368031126
translation [ 0.11020941  0.52164938 -2.79947506]
[nerf_runner.py] denoise cloud
[nerf_runner.py] Denoising rays based on octree cloud
[nerf_runner.py] bad_mask#=355
rays torch.Size([18083, 12])
Start training
[nerf_runner.py] train progress 0/2001
[nerf_runner.py] Iter: 0, valid_samples: 655360/655360, valid_rays: 2048/2048, loss: 24.5833321, rgb_loss: 24.4866676, rgb0_loss: 0.0000000, fs_rgb_loss: 0.0000000, depth_loss: 0.0000000, depth_loss0: 0.0000000, fs_loss: 0.0026700, point_cloud_loss: 0.0000000, point_cloud_normal_loss: 0.0000000, sdf_loss: 0.0003162, eikonal_loss: 0.0000000, variation_loss: 0.0000000, truncation(meter): 0.0100000, pose_reg: 0.0000000, reg_features: 0.0936778,

[nerf_runner.py] train progress 200/2001
[nerf_runner.py] train progress 400/2001
[nerf_runner.py] train progress 600/2001
[nerf_runner.py] train progress 800/2001
[nerf_runner.py] train progress 1000/2001
[nerf_runner.py] train progress 1200/2001
[nerf_runner.py] train progress 1400/2001
[nerf_runner.py] train progress 1600/2001
[nerf_runner.py] train progress 1800/2001
[nerf_runner.py] train progress 2000/2001
cp: cannot stat '/home/obinata/Programs/BundleSDF/bundlesdf_original_dataset///nerf_with_bundletrack_online/image_step_*.png': No such file or directory
Traceback (most recent call last):
  File "run_custom.py", line 223, in <module>
    run_one_video(video_dir=args.video_dir, out_folder=args.out_folder, use_segmenter=args.use_segmenter, use_gui=args.use_gui)
  File "run_custom.py", line 107, in run_one_video
    run_one_video_global_nerf(out_folder=out_folder)
  File "run_custom.py", line 152, in run_one_video_global_nerf
    tracker.run_global_nerf(reader=reader, get_texture=True, tex_res=512)
  File "/home/obinata/Programs/BundleSDF/BundleSDF/bundlesdf.py", line 747, in run_global_nerf
    mesh,sigma,query_pts = nerf.extract_mesh(voxel_size=self.cfg_nerf['mesh_resolution'],isolevel=0, return_sigma=True)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/obinata/Programs/BundleSDF/BundleSDF/nerf_runner.py", line 1363, in extract_mesh
    query_pts = torch.tensor(np.stack(np.meshgrid(tx, ty, tz, indexing='ij'), -1).astype(np.float32).reshape(-1,3)).float().cuda()
  File "<__array_function__ internals>", line 200, in meshgrid
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/numpy/lib/function_base.py", line 5045, in meshgrid
    output = [x.copy() for x in output]
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/numpy/lib/function_base.py", line 5045, in <listcomp>
    output = [x.copy() for x in output]
numpy.core._exceptions.MemoryError: Unable to allocate 257. GiB for an array with shape (3254, 3254, 3254) and data type float64
Process Process-4:
Traceback (most recent call last):
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/obinata/Programs/BundleSDF/BundleSDF/bundlesdf.py", line 89, in run_nerf
    join = p_dict['join']
  File "<string>", line 2, in __getitem__
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod
    kind, result = conn.recv()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
[2023-10-24 01:47:18.977] [warning] [Bundler.cpp:59] Destructor
[2023-10-24 01:47:19.280] [warning] [Bundler.cpp:59] Destructor

The program seems to want to allocate a large amount of memory 257. GiB . Do you have any idea how to save its memory? Or is my dataset wrong? I'll attach my dataset. https://drive.google.com/drive/folders/1-gNpxjGda-10gv2FvdFLZlTLJ8bFktuO?usp=sharing

Before running into this, the program raises pdb. Whenever it runs into pdb, I enter the c key and continue the script. Is this the correct behavior?

> /home/obinata/Programs/BundleSDF/BundleSDF/bundlesdf.py(452)process_new_frame()
-> for id in ids:
(Pdb) c
[bundlesdf.py] trying new ref frame 1698120851765039444
[bundlesdf.py] frame_pairs: 1
[2023-10-24 01:45:24.789] [warning] [FeatureManager.cpp:2690] _raw_matches found exsting pair (1698120894462734461, 1698120851765039444)
[bundlesdf.py] frame 1698120894462734461 has not suitable ref_frame, mark as FAIL
[2023-10-24 01:45:24.789] [warning] [Bundler.cpp:67] forgetting frame 1698120894462734461
[2023-10-24 01:45:24.789] [warning] [FeatureManager.cpp:469] forgetting frame 1698120894462734461
[bundlesdf.py] processNewFrame done 1698120894462734461
[bundlesdf.py] rematch_after_nerf: True
[2023-10-24 01:45:24.789] [warning] [Bundler.cpp:961] Welcome saveNewframeResult
[2023-10-24 01:45:24.813] [warning] [Bundler.cpp:1110] saveNewframeResult done
[bundlesdf.py] percentile denoise start
[bundlesdf.py] percentile denoise done
[bundlesdf.py] processNewFrame start 1698120894529448032
[bundlesdf.py] process frame 1698120894529448032
[bundlesdf.py] frame_pairs: 1
[loftr_wrapper.py] image0: torch.Size([1, 1, 400, 400])
[loftr_wrapper.py] net forward
[loftr_wrapper.py] mconf, 0.20438554883003235 0.9797393083572388
[loftr_wrapper.py] pair_ids (180,)
[loftr_wrapper.py] corres: (180, 5)
[2023-10-24 01:45:24.877] [warning] [FeatureManager.cpp:1589] start multi pair ransac GPU, pairs#=1
[2023-10-24 01:45:24.878] [warning] [FeatureManager.cpp:1695] after ransac, frame 1698120894529448032 and 1698120851765039444 has too few matches #0, ignore
> /home/obinata/Programs/BundleSDF/BundleSDF/bundlesdf.py(452)process_new_frame()
-> for id in ids:
(Pdb) c
[bundlesdf.py] trying new ref frame 1698120851765039444
[bundlesdf.py] frame_pairs: 1
[2023-10-24 01:45:25.021] [warning] [FeatureManager.cpp:2690] _raw_matches found exsting pair (1698120894529448032, 1698120851765039444)
[bundlesdf.py] frame 1698120894529448032 has not suitable ref_frame, mark as FAIL
[2023-10-24 01:45:25.021] [warning] [Bundler.cpp:67] forgetting frame 1698120894529448032
[2023-10-24 01:45:25.021] [warning] [FeatureManager.cpp:469] forgetting frame 1698120894529448032
[bundlesdf.py] processNewFrame done 1698120894529448032
[bundlesdf.py] rematch_after_nerf: True
[2023-10-24 01:45:25.021] [warning] [Bundler.cpp:961] Welcome saveNewframeResult
[2023-10-24 01:45:25.046] [warning] [Bundler.cpp:1110] saveNewframeResult done
[bundlesdf.py] percentile denoise start
[bundlesdf.py] percentile denoise done
[bundlesdf.py] processNewFrame start 1698120894596162796
[bundlesdf.py] process frame 1698120894596162796
[bundlesdf.py] frame_pairs: 1
[loftr_wrapper.py] image0: torch.Size([1, 1, 400, 400])
[loftr_wrapper.py] net forward
[loftr_wrapper.py] mconf, 0.20218929648399353 0.9770877361297607
[loftr_wrapper.py] pair_ids (199,)
[loftr_wrapper.py] corres: (199, 5)
[2023-10-24 01:45:25.111] [warning] [FeatureManager.cpp:1589] start multi pair ransac GPU, pairs#=1
[2023-10-24 01:45:25.112] [warning] [FeatureManager.cpp:1695] after ransac, frame 1698120894596162796 and 1698120851765039444 has too few matches #0, ignore
> /home/obinata/Programs/BundleSDF/BundleSDF/bundlesdf.py(452)process_new_frame()
-> for id in ids:
(Pdb) c
[bundlesdf.py] trying new ref frame 1698120851765039444
[bundlesdf.py] frame_pairs: 1
[2023-10-24 01:45:25.210] [warning] [FeatureManager.cpp:2690] _raw_matches found exsting pair (1698120894596162796, 1698120851765039444)
[bundlesdf.py] frame 1698120894596162796 has not suitable ref_frame, mark as FAIL
[2023-10-24 01:45:25.210] [warning] [Bundler.cpp:67] forgetting frame 1698120894596162796
[2023-10-24 01:45:25.210] [warning] [FeatureManager.cpp:469] forgetting frame 1698120894596162796
[bundlesdf.py] processNewFrame done 1698120894596162796
[bundlesdf.py] rematch_after_nerf: True
[2023-10-24 01:45:25.210] [warning] [Bundler.cpp:961] Welcome saveNewframeResult
[2023-10-24 01:45:25.234] [warning] [Bundler.cpp:1110] saveNewframeResult done
[bundlesdf.py] percentile denoise start
[bundlesdf.py] percentile denoise done
[bundlesdf.py] processNewFrame start 1698120894662878036
[bundlesdf.py] process frame 1698120894662878036
[bundlesdf.py] frame_pairs: 1
[loftr_wrapper.py] image0: torch.Size([1, 1, 400, 400])
[loftr_wrapper.py] net forward
[loftr_wrapper.py] mconf, 0.2009984850883484 0.9146439433097839
[loftr_wrapper.py] pair_ids (209,)
[loftr_wrapper.py] corres: (209, 5)
[2023-10-24 01:45:25.296] [warning] [FeatureManager.cpp:1589] start multi pair ransac GPU, pairs#=1
[2023-10-24 01:45:25.296] [warning] [FeatureManager.cpp:1695] after ransac, frame 1698120894662878036 and 1698120851765039444 has too few matches #0, ignore

My PC spec is

CPU: AMD Ryzen 9 5950X

RAM: 128GB

GPU: RTX 3090Ti

wenbowen123 commented 10 months ago

what is the rough size of your object? It seems like it's very large.

wenbowen123 commented 10 months ago

reading your log again, it seems like the tracking gets lost in the middle. Can you check if the video segmentation works OK?

mqcmd196 commented 10 months ago

@wenbowen123 Thank you for your response!

what is the rough size of your object? It seems like it's very large.

The object is chair. The size is about height: 880mm, width: 600mm, depth: 600mm

reading your log again, it seems like the tracking gets lost in the middle. Can you check if the video segmentation works OK?

I believe its fine to see created dataset . Or is it not good that it is a swivel chair, so the seat and the legs rotate?

wenbowen123 commented 10 months ago

In your video the seat rotates, which will be an issue for BundleSDF since it deals with single rigid object.

mqcmd196 commented 10 months ago

I see, I'll try to mask the seat only

mqcmd196 commented 10 months ago

@wenbowen123 still raises the same error. The tried dataset is here

cp: cannot stat '/home/obinata/Programs/BundleSDF/bundlesdf_original_dataset///nerf_with_bundletrack_online/image_step_*.png': No such file or directory
Traceback (most recent call last): 
  File "run_custom.py", line 223, in <module>
    run_one_video(video_dir=args.video_dir, out_folder=args.out_folder, use_segmenter=args.use_segmenter, use_gui=args.use_gui)
  File "run_custom.py", line 107, in run_one_video
    run_one_video_global_nerf(out_folder=out_folder)
  File "run_custom.py", line 152, in run_one_video_global_nerf
    tracker.run_global_nerf(reader=reader, get_texture=True, tex_res=512)
  File "/home/obinata/Programs/BundleSDF/BundleSDF/bundlesdf.py", line 747, in run_global_nerf
    mesh,sigma,query_pts = nerf.extract_mesh(voxel_size=self.cfg_nerf['mesh_resolution'],isolevel=0, return_sigma=True)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/obinata/Programs/BundleSDF/BundleSDF/nerf_runner.py", line 1363, in extract_mesh
    query_pts = torch.tensor(np.stack(np.meshgrid(tx, ty, tz, indexing='ij'), -1).astype(np.float32).reshape(-1,3)).float().cuda()
  File "<__array_function__ internals>", line 200, in meshgrid
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/numpy/lib/function_base.py", line 5045, in meshgrid
    output = [x.copy() for x in output]
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/numpy/lib/function_base.py", line 5045, in <listcomp>
    output = [x.copy() for x in output]
numpy.core._exceptions.MemoryError: Unable to allocate 254. GiB for an array with shape (3244, 3244, 3244) and data type float64
Process Process-4:
Traceback (most recent call last): 
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/obinata/Programs/BundleSDF/BundleSDF/bundlesdf.py", line 89, in run_nerf
    join = p_dict['join']
  File "<string>", line 2, in __getitem__
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod
    kind, result = conn.recv()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
wenbowen123 commented 7 months ago

Did you save the output folder (when debug>=4) that can be shared? One thing I noticed is in the first frame the chair is not visible. It's best to trim the video from the point that the chair is not occluded to begin with.