donydchen / mvsplat

🌊 [ECCV'24 Oral] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
https://donydchen.github.io/mvsplat
MIT License
750 stars 35 forks source link

Got errors when evaluating. #19

Closed karaokenoway closed 5 months ago

karaokenoway commented 5 months ago

Hi, what a great job! I have set the cuda devices set CUDA_VISIBLE_DEVICES=0 and i run the evaluation code:python -m src.main +experiment=acid checkpointing.load=checkpoints/acid.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json test.compute_scores=true there is something wrong with my code: Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: C:\Users\user.conda\envs\mvsplat\lib\site-packages\lpips\weights\v0.1\vgg.pth Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2 [W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-M7IQG5O]:2765 (system error: 10049 Error executing job with overrides: ['+experiment=acid', 'checkpointing.load=checkpoints/acid.ckpt', 'mode=test', 'dataset/view_sampler=evaluation', 'dataset.view_sampler.index_path=assets/evaluation_index_acid.json', 'test.compute_scores=true'] Traceback (most recent call last): File "F:\Github\mvsplat\src\main.py", line 143, in train trainer.test( File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 754, in test return call._call_and_handle_interrupt( File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, kwargs) File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 105, in launch return function(args, kwargs) File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 794, in _test_impl results = self._run(model, ckpt_path=ckpt_path) File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 943, in _run self.strategy.setup_environment() File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 154, in setup_environment self.setup_distributed() File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 203, in setup_distributed _init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout) File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\lightning_fabric\utilities\distributed.py", line 291, in _init_dist_connection torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, kwargs) File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper func_return = func(args, kwargs) File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group defaultpg, = _new_process_group_helper( File "C:\Users\user.conda\envs\mvsplat\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL built in") RuntimeError: Distributed package doesn't have NCCL built in I follow the instructions and i don't know why? Plz help!

donydchen commented 5 months ago

Hi @karaokenoway, thanks for your interest in our work.

From your log, it seems that you are running this project on a Windows machine. We have only tested our codes on Linux machines, and unfortunately, we do not have any experience with running codes on Windows...

You might need to search for how to ensure codes running on only 1 GPU on Windows with pythorch-lightning, since the log (Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2) seems to show that it is running on 2 GPUs.

karaokenoway commented 5 months ago

@donydchen Thank you all the best! Really appreciate that