Failed to run run_custom.py command on milk dataset

percypeng5221 commented 9 months ago

I was running the run_custom.py file on the milk dataset. But the output is like this:


(py38) root@percyp:/home/percyp/BundleSDF# python run_custom.py --mode run_video --video_dir /home/percyp/milk --out_folder /home/percyp/milk2 --use_segmenter 1 --use_gui 1 --debug_level 2
[2023-11-20 14:46:05.433] [warning] [Bundler.cpp:49] Connected to nerf_port 9999
[2023-11-20 14:46:05.433] [warning] [FeatureManager.cpp:2084] Connected to port 5555
default_cfg {'backbone_type': 'ResNetFPN', 'resolution': (8, 2), 'fine_window_size': 5, 'fine_concat_coarse_feat': True, 'resnetfpn': {'initial_dim': 128, 'block_dims': [128, 196, 256]}, 'coarse': {'d_model': 256, 'd_ffn': 256, 'nhead': 8, 'layer_names': ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross'], 'attention': 'linear', 'temp_bug_fix': False}, 'match_coarse': {'thr': 0.2, 'border_rm': 2, 'match_type': 'dual_softmax', 'dsmax_temperature': 0.1, 'skh_iters': 3, 'skh_init_bin_score': 1.0, 'skh_prefilter': True, 'train_coarse_percent': 0.4, 'train_pad_num_gt_min': 200}, 'fine': {'d_model': 128, 'd_ffn': 128, 'nhead': 8, 'layer_names': ['self', 'cross'], 'attention': 'linear'}}
GUI started
libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
libGL error: failed to load driver: swrast
Glfw Error 65543: GLX: Failed to create context: GLXBadFBConfig
python: /home/runner/work/DearPyGui/DearPyGui/thirdparty/glfw/src/window.c:533: glfwSetWindowPos: Assertion `window != NULL' failed.
^CTraceback (most recent call last):
  File "run_custom.py", line 223, in <module>
    run_one_video(video_dir=args.video_dir, out_folder=args.out_folder, use_segmenter=args.use_segmenter, use_gui=args.use_gui)
  File "run_custom.py", line 103, in run_one_video
    tracker.run(color, depth, K, id_str, mask=mask, occ_mask=None, pose_in_model=pose_in_model)
  File "/home/percyp/BundleSDF/bundlesdf.py", line 520, in run
    with self.gui_lock:
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process Process-3:
Traceback (most recent call last):
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/percyp/BundleSDF/bundlesdf.py", line 128, in run_nerf
    time.sleep(0.01)
KeyboardInterrupt
[2023-11-20 14:47:11.856] [warning] [Bundler.cpp:59] Destructor```

It stucks in between so I have to stop the program. I checked the CPU and GPU and neither of them is working. SO I think something goes wrong.

wenbowen123 commented 9 months ago

does your desktop have a display? If not, you could also try without gui, by --use_gui 0

percypeng5221 commented 9 months ago

My desktop has a display. I may stick to using that cause I wanna test this algorithm in a big framework

wenbowen123 commented 9 months ago

can you provide more info (OS, GPU version, CUDA version)?

percypeng5221 commented 9 months ago

This is my output from nvidia-smi:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   57C    P8              23W /  80W |     17MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1229      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      1982      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

My system is Ubuntu 20.04.6 LTS focal. And this is my docker info:

Client: Docker Engine - Community
 Version:    24.0.7
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.21.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 1
 Server Version: 24.0.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
 runc version: v1.1.9-0-gccaecfc
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.15.0-88-generic
 Operating System: Ubuntu 20.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 31.08GiB
 Name: percyp
 ID: 5d5306fc-c351-4573-9cb3-143edeb69e03
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

wenbowen123 commented 9 months ago

I've never seen this issue before. If you dont use GUI, are you able to run? I know you need GUI, just curious.

percypeng5221 commented 9 months ago

log.txt Well, it seems that without GUI, the code can run. I've attached the output. But it seems like I don't have this package:

Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/percyp/BundleSDF/bundlesdf.py", line 219, in run_nerf
    nerf = NerfRunner(cfg_nerf,rgbs,depths=depths,masks=masks,normal_maps=normal_maps,occ_masks=occ_masks,poses=poses,K=K,build_octree_pcd=pcd_normalized)
  File "/home/percyp/BundleSDF/nerf_runner.py", line 156, in __init__
    self.create_nerf()
  File "/home/percyp/BundleSDF/nerf_runner.py", line 208, in create_nerf
    embed_fn, input_ch = get_embedder(self.cfg['multires'], self.cfg, i=self.cfg['i_embed'], octree_m=self.octree_m)
  File "/home/percyp/BundleSDF/nerf_helpers.py", line 207, in get_embedder
    from mycuda.torch_ngp_grid_encoder.grid import GridEncoder
  File "/home/percyp/BundleSDF/mycuda/torch_ngp_grid_encoder/grid.py", line 23, in <module>
    import gridencoder
ModuleNotFoundError: No module named 'gridencoder'

ArghyaChatterjee commented 9 months ago

I am also having the same issue when running without the GUI, how to get rid of that ?? @percypeng5221 @wenbowen123 were you able to solve the issue ?

[2023-11-25 09:44:51.998] [warning] [FeatureManager.cpp:1589] start multi pair ransac GPU, pairs#=1
[2023-11-25 09:44:51.999] [warning] [FeatureManager.cpp:1699] ransac makes match betwee frame 0090 0089 #inliers=755, #prev 764
[bundlesdf.py] frame 0090 pose update before
[[ 0.992  0.113 -0.054 -0.039]
 [-0.087  0.933  0.348 -0.251]
 [ 0.089 -0.34   0.936 -0.421]
 [ 0.     0.     0.     1.   ]]
[2023-11-25 09:44:52.001] [warning] [FeatureManager.cpp:1095] procrustesByCorrespondence err per point between 0090 and 0089: 4.11612e-05
[bundlesdf.py] frame 0090 pose update after
[[ 0.99   0.124 -0.064 -0.035]
 [-0.092  0.927  0.365 -0.258]
 [ 0.105 -0.355  0.929 -0.418]
 [ 0.     0.     0.     1.   ]]
[2023-11-25 09:44:52.001] [warning] [Bundler.cpp:67] forgetting frame 0083
[2023-11-25 09:44:52.001] [warning] [FeatureManager.cpp:469] forgetting frame 0083
[bundlesdf.py] exceed window size, forget frame 0083
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:435] total keyframes=4, want to select 10
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:793] frame 0090 and 0000 visible=0.942166
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:802] add frame (0090, 0000) into pairs
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:793] frame 0090 and 0078 visible=0.941756
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:802] add frame (0090, 0078) into pairs
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:793] frame 0090 and 0082 visible=0.945857
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:802] add frame (0090, 0082) into pairs
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:793] frame 0090 and 0086 visible=0.945037
[2023-11-25 09:44:52.002] [warning] [Bundler.cpp:802] add frame (0090, 0086) into pairs
[bundlesdf.py] frame_pairs: 4
[loftr_wrapper.py] image0: torch.Size([4, 1, 400, 400])
[loftr_wrapper.py] net forward
[loftr_wrapper.py] mconf, 0.2003738433122635 0.998960018157959
[loftr_wrapper.py] pair_ids (2955,)
[loftr_wrapper.py] corres: (2955, 5)
[2023-11-25 09:44:52.222] [warning] [FeatureManager.cpp:1589] start multi pair ransac GPU, pairs#=4
[2023-11-25 09:44:52.232] [warning] [FeatureManager.cpp:1699] ransac makes match betwee frame 0090 0000 #inliers=295, #prev 345
[2023-11-25 09:44:52.232] [warning] [FeatureManager.cpp:1699] ransac makes match betwee frame 0090 0078 #inliers=442, #prev 487
[2023-11-25 09:44:52.232] [warning] [FeatureManager.cpp:1699] ransac makes match betwee frame 0090 0082 #inliers=592, #prev 634
[2023-11-25 09:44:52.232] [warning] [FeatureManager.cpp:1699] ransac makes match betwee frame 0090 0086 #inliers=709, #prev 739
#optimizeGPU frames=5, #keyframes=4, #_frames=10
0000 0078 0082 0086 0090 
[2023-11-25 09:44:52.245] [warning] [Bundler.cpp:920] OptimizerGPU begin, global_corres#=5844
global_corres=5844
maxNumResiduals / maxNumberOfImages = 53844 / 5 = 10768
m_maxNumberOfImages*m_maxCorrPerImage = 5 x 2797 = 13985
m_solver->solve Time difference = 18.845[ms]
[2023-11-25 09:44:52.269] [warning] [Bundler.cpp:924] OptimizerGPU finish
[2023-11-25 09:44:52.269] [warning] [Bundler.cpp:320] Added frame 0090 as keyframe, current #keyframe: 5
[bundlesdf.py] processNewFrame done 0090
[bundlesdf.py] 0090 prepare data for nerf
[bundlesdf.py] out_dir: /home/arghya/BundleSDF/ho3d_v3_generated_mesh/MPM10//0090/nerf
[tool.py] compute_scene_bounds_worker start
[tool.py] compute_scene_bounds_worker done
[tool.py] merge pcd
[tool.py] compute_translation_scales done
translation_cvcam=[ 0.00559992  0.00377557 -0.0042612 ], sc_factor=16.757416592922073
[bundlesdf.py] First nerf run, create Runner, latest nerf frame 0090
[nerf_runner.py] Octree voxel dilate_radius:1
Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/arghya/BundleSDF/bundlesdf.py", line 219, in run_nerf
    nerf = NerfRunner(cfg_nerf,rgbs,depths=depths,masks=masks,normal_maps=normal_maps,occ_masks=occ_masks,poses=poses,K=K,build_octree_pcd=pcd_normalized)
  File "/home/arghya/BundleSDF/nerf_runner.py", line 156, in __init__
    self.create_nerf()
  File "/home/arghya/BundleSDF/nerf_runner.py", line 208, in create_nerf
    embed_fn, input_ch = get_embedder(self.cfg['multires'], self.cfg, i=self.cfg['i_embed'], octree_m=self.octree_m)
  File "/home/arghya/BundleSDF/nerf_helpers.py", line 207, in get_embedder
    from mycuda.torch_ngp_grid_encoder.grid import GridEncoder
  File "/home/arghya/BundleSDF/mycuda/torch_ngp_grid_encoder/grid.py", line 23, in <module>
    import gridencoder
ModuleNotFoundError: No module named 'gridencoder'

percypeng5221 commented 9 months ago

I may choose another algorithm to do this task. It’s a little too much for my need based on my recent experience. I’m willing to help you guys solve this issue if you need me but later I might move on. @wenbowen123 @ArghyaChatterjee

ArghyaChatterjee commented 9 months ago

BTW, I am also having issues with the GUI on, is there any solution to that one as well @wenbowen123 ??

(py38) root@arghya-Pulse-GL66-12UEK:/home/arghya/BundleSDF# python3 run_ho3d.py --use_gui 1
video_dirs:
 ['/home/arghya/BundleSDF/HO3D_v3/evaluation/MPM10']
[2023-11-25 10:28:24.403] [warning] [Bundler.cpp:49] Connected to nerf_port 9999
[2023-11-25 10:28:24.403] [warning] [FeatureManager.cpp:2084] Connected to port 5555
default_cfg {'backbone_type': 'ResNetFPN', 'resolution': (8, 2), 'fine_window_size': 5, 'fine_concat_coarse_feat': True, 'resnetfpn': {'initial_dim': 128, 'block_dims': [128, 196, 256]}, 'coarse': {'d_model': 256, 'd_ffn': 256, 'nhead': 8, 'layer_names': ['self', 'cross', 'self', 'cross', 'self', 'cross', 'self', 'cross'], 'attention': 'linear', 'temp_bug_fix': False}, 'match_coarse': {'thr': 0.2, 'border_rm': 2, 'match_type': 'dual_softmax', 'dsmax_temperature': 0.1, 'skh_iters': 3, 'skh_init_bin_score': 1.0, 'skh_prefilter': True, 'train_coarse_percent': 0.4, 'train_pad_num_gt_min': 200}, 'fine': {'d_model': 128, 'd_ffn': 128, 'nhead': 8, 'layer_names': ['self', 'cross'], 'attention': 'linear'}}
GUI started
libGL error: MESA-LOADER: failed to retrieve device information
Glfw Error 65543: GLX: Failed to create context: GLXBadFBConfig
python3: /home/runner/work/DearPyGui/DearPyGui/thirdparty/glfw/src/window.c:533: glfwSetWindowPos: Assertion `window != NULL' failed.

fedona commented 9 months ago

I am having the same issue "ModuleNotFoundError: No module named 'gridencoder'" as soon as there is a nerf run.

Any news?

wenbowen123 commented 9 months ago

Did you run bash build.sh as mentioned in readme?

percypeng5221 commented 9 months ago

Yeah, I did

fedona commented 9 months ago

I am having the same issue "ModuleNotFoundError: No module named 'gridencoder'" as soon as there is a nerf run.

Any news?

Did you run bash build.sh as mentioned in readme?

this was the problem for me :/

cynthia-you commented 9 months ago

I am having the same issue "ModuleNotFoundError: No module named 'gridencoder'" as soon as there is a nerf run. Any news?

Did you run bash build.sh as mentioned in readme?

this was the problem for me :/

#################### just re-execute 'build.sh' indocker ################ I have encountered this issue sometimes, and its very strange that the 'build.sh' successed and the bundlesdf run coreactly. But if i exit the docker and rerun, it will happens. My solution is build again.

wenbowen123 commented 9 months ago

this is more about the docker usage, but if you dont kill the docker container you dont need rebuild. Otherwise you would need.

cwchenwang commented 2 months ago

I also met the same problem when using GUI, but it works without using GUI. I have already running build.sh. Any suggestions? @wenbowen123

wenbowen123 commented 2 months ago

@cwchenwang what is the error using GUI?

NVlabs / BundleSDF

Failed to run run_custom.py command on milk dataset #121