Typos in the dataset's YAML files and README.md | A problem encountered with 'Mapper.py' and 'neural_point.py'

Deng-King commented 12 months ago

Hi @eriksandstroem,

Thank you for your nice work! However, I noticed a few typos in the dataset's YAML files and the README script.

When I ran the command python run.py configs/Replica/room0.yaml, I received a error FileNotFoundError: [Errno 2] No such file or directory: 'Datasets/Replica/room0/traj.txt'. In the project directory, it should be '.../datasets/...', which can be corrected by replacing all instances of _'inputfolder: Datasets/...' with _'inputfolder: datasets/...' in the YAML files under the './configs/Replica/' directory.

Additionally, in the README script, it says using conda env create -f environment.yaml & conda activate point-slam-env to create the virtual environment for this project. However, according to the file ’env.yaml‘, the correct commands should be conda env create -f env.yaml & conda activate point-slam.

I hope this may helps!

After correcting the typos above, I have re-ran the command python run.py configs/Replica/room0.yaml and encountered the following error:

(point-slam) ~/code/Point-SLAM$ python run.py configs/Replica/room0.yaml

⭐️ INFO: The output folder is output/Replica/room0/20231007_173317
⭐️ INFO: The GT, generated and residual depth/color images can be found under output/Replica/room0/20231007_173317/tracking_vis/ and output/Replica/room0/20231007_173317/mapping_vis/
⭐️ INFO: The mesh can be found under output/Replica/room0/20231007_173317/mesh/
⭐️ INFO: The checkpoint can be found under output/Replica/room0/20231007_173317/ckpt/

Mapping Frame  0

/home/miniconda3/envs/point-slam/lib/python3.10/site-packages/faiss/contrib/torch_utils.py:51: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  x.storage().data_ptr() + x.storage_offset() * 4)
Process mapper:
Traceback (most recent call last):
  File "/home/miniconda3/envs/point-slam/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/miniconda3/envs/point-slam/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/code/Point-SLAM/src/Point_SLAM.py", line 187, in mapping
    self.mapper.run(time_string)
  File "/home/code/Point-SLAM/src/Mapper.py", line 735, in run
    _ = self.optimize_map(num_joint_iters, idx, gt_color, gt_depth, gt_c2w,
  File "/home/code/Point-SLAM/src/Mapper.py", line 317, in optimize_map
    _ = self.npc.add_neural_points(batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color,
  File "<string>", line 2, in add_neural_points
  File "/home/miniconda3/envs/point-slam/lib/python3.10/multiprocessing/managers.py", line 833, in _callmethod
    raise convert_to_error(kind, result)
ValueError: not enough values to unpack (expected 2, got 1)
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

After extensive debugging, it was found that batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color in 'Mapper.py' are set to zero while maintaining their original shapes after being passed to the function _ = self.npc.add_neural_points(batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color, dynamic_radius=self.dynamic_r_add[j, i] if self.use_dynamic_radius else None) (line 317 in 'Mapper.py')

For instance, I printed them on the screen by making some code modifications in ‘Mapper.py’:

(line 316)
            print('##### TEST!! [Mapper]',torch.sum(batch_rays_o), torch.numel(batch_rays_o))
            print('##### TEST!! [Mapper]',torch.sum(batch_rays_d), torch.numel(batch_rays_d))
            print('##### TEST!! [Mapper]',torch.sum(batch_gt_depth), torch.numel(batch_gt_depth))
            print('##### TEST!! [Mapper]',torch.sum(batch_gt_color), torch.numel(batch_gt_color))
            # batch_rays_o = batch_rays_o.clone()
            _ = self.npc.add_neural_points(batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color,
                                           dynamic_radius=self.dynamic_r_add[j, i] if self.use_dynamic_radius else None)

and in _'neuralpoint.py':

(line 90) 
    def add_neural_points(self, batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color,
                          train=False, is_pts_grad=False, dynamic_radius=None):
        print('======')
        print('##### TEST!! the program has entered the function add_neural_points()')
        print('##### TEST!! [add_neural_points()]',torch.sum(batch_rays_o), torch.numel(batch_rays_o))
        print('##### TEST!! [add_neural_points()]',torch.sum(batch_rays_d), torch.numel(batch_rays_d))
        print('##### TEST!! [add_neural_points()]',torch.sum(batch_gt_depth), torch.numel(batch_gt_depth))
        print('##### TEST!! [add_neural_points()]',torch.sum(batch_gt_color), torch.numel(batch_gt_color))
        print('======')

        if batch_rays_o.shape[0]:
            print('TEST!! [add_neural_points()] if batch_rays_o.shape[0] is TURE when error has occured')
            mask = batch_gt_depth > 0
            batch_gt_color = batch_gt_color*255
            batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color = \
                batch_rays_o[mask], batch_rays_d[mask], batch_gt_depth[mask], batch_gt_color[mask]

            ...
            ...

            if train or not self.index.is_trained:
                self.index.train(pts)
            print('##### TEST!! [add_neural_points()] Where will it end?')
            print('##### TEST!! [add_neural_points()]',type(self._cloud_pos))
            print('##### TEST!! [add_neural_points()]',self._cloud_pos)
            self.index.train(torch.tensor(self._cloud_pos, device=self.device))
            print('##### TEST!! [add_neural_points()] can it get here?')
            self.index.add(pts)
            print(torch.sum(mask))
            return torch.sum(mask)
        else:
            print('##### TEST!! [add_neural_points()] error will be occured here if batch_rays_o.shape[0] is false? ')
            return 0

We can find that the process will be terminated at self.index.train(torch.tensor(self._cloud_pos, device=self.device)) with the following screen output:

(point-slam) ~/code/Point-SLAM$ python run.py

⭐️ INFO: The output folder is output/Replica/room0/20231007_211805
⭐️ INFO: The GT, generated and residual depth/color images can be found under output/Replica/room0/20231007_211805/tracking_vis/ and output/Replica/room0/20231007_211805/mapping_vis/
⭐️ INFO: The mesh can be found under output/Replica/room0/20231007_211805/mesh/
⭐️ INFO: The checkpoint can be found under output/Replica/room0/20231007_211805/ckpt/

Mapping Frame  0

##### TEST!! [Mapper] tensor(31279.0273, device='cuda:0') 20847
##### TEST!! [Mapper] tensor(-11126.9150, device='cuda:0') 20847
##### TEST!! [Mapper] tensor(18767.3262, device='cuda:0') 6949
##### TEST!! [Mapper] tensor(12228.9961, device='cuda:0', dtype=torch.float64) 20847
======
##### TEST!! the program has entered the function add_neural_points()
##### TEST!! [add_neural_points()] tensor(0., device='cuda:0') 20847
##### TEST!! [add_neural_points()] tensor(0., device='cuda:0') 20847
##### TEST!! [add_neural_points()] tensor(0., device='cuda:0') 6949
##### TEST!! [add_neural_points()] tensor(0., device='cuda:0', dtype=torch.float64) 20847
======
TEST!! [add_neural_points()] if batch_rays_o.shape[0] is TURE when error has occured
/home/miniconda3/envs/point-slam/lib/python3.10/site-packages/faiss/contrib/torch_utils.py:51: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  x.storage().data_ptr() + x.storage_offset() * 4)
##### TEST!! [add_neural_points()] Where will it end?
##### TEST!! [add_neural_points()] <class 'list'>
##### TEST!! [add_neural_points()] []
Process mapper:
Traceback (most recent call last):
  File "/home/miniconda3/envs/point-slam/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/miniconda3/envs/point-slam/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/code/Point-SLAM/src/Point_SLAM.py", line 187, in mapping
    self.mapper.run(time_string)
  File "/home/code/Point-SLAM/src/Mapper.py", line 741, in run
    _ = self.optimize_map(num_joint_iters, idx, gt_color, gt_depth, gt_c2w,
  File "/home/code/Point-SLAM/src/Mapper.py", line 323, in optimize_map
    _ = self.npc.add_neural_points(batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color,
  File "<string>", line 2, in add_neural_points
  File "/home/miniconda3/envs/point-slam/lib/python3.10/multiprocessing/managers.py", line 833, in _callmethod
    raise convert_to_error(kind, result)
ValueError: not enough values to unpack (expected 2, got 1)
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

I have spent 8 hours on this and I have no idea why this is happening (neither does New Bing & ChatGPT). :(

Thank you for taking the time to check this issue. I deeply appreciate any help you can provide.

(The code is running on WSL 2.0-Ubuntu 22.04.2 LTS with RTX 3060 Ti, BTW)

eriksandstroem commented 11 months ago

Hi @Deng-King, Thank you very much for providing the corrections to the config and readme files. Much appreciated! I have updated them now.

Regarding your question, I will look into it. It sounds indeed very strange that the batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color in Mapper.py are set to zero, which should not happen.

Just as a sanity check: what version of the faiss library are you using?

Deng-King commented 11 months ago

Hi @Deng-King, Thank you very much for providing the corrections to the config and readme files. Much appreciated! I have updated them now.

Regarding your question, I will look into it. It sounds indeed very strange that the batch_rays_o, batch_rays_d, batch_gt_depth, batch_gt_color in Mapper.py are set to zero, which should not happen.

Just as a sanity check: what version of the faiss library are you using?

The version of faiss is 1.7.2, which is the same as the version specified in the env.yaml file.

...
exceptiongroup            1.1.3                    pypi_0    pypi
executing                 2.0.0                    pypi_0    pypi
faiss-gpu                 1.7.2                    pypi_0    pypi
fastjsonschema            2.16.2                   pypi_0    pypi
ffmpeg                    4.3                  hf484d3e_0    pytorch
...

eriksandstroem commented 11 months ago

I can not replicate this behavior. I would therefore hypothesize that your problem is either related to the environment or the hardware you are running it on - for example, we never tried running the code using WSL 2.0. If you have access to a computer running Linux, then I would try that. Best of luck with it and let me know if you have any other questions on this.

Deng-King commented 11 months ago

I’ll try it on a Linux server later. It would be better if you could provide a Docker image of the project, so that I can rule out any problems stemming from the code environment. For the time being, my virtual environment seems to be the same as env.yaml, and the primary differences are the OS and hardware.

eriksandstroem commented 11 months ago

If I would take a guess, I would say that your environment is not the issue (as multiple people have already installed the environment and got it working on linux without a problem). I would therefore not spend the time right now to make a docker image before you have tried the pipeline on a linux machine. Hope that is ok for you.

eriksandstroem / Point-SLAM

Typos in the dataset's YAML files and README.md | A problem encountered with 'Mapper.py' and 'neural_point.py' #7