NVlabs / CF-3DGS

Other
288 stars 28 forks source link

Immediate stop at training progress 0% #24

Open willyawan16 opened 1 month ago

willyawan16 commented 1 month ago

Does anyone face this problem? Currently, I am trying to train with the dataset provided "Tanks/Francis", but it failed.

(cf3dgs) D:\Research\CF3DGS\CF-3DGS>python run_cf3dgs.py -s data/Tanks/Francis --mode train
Downloading: "https://github.com/intel-isl/MiDaS/zipball/master" to C:\Users\PC21/.cache\torch\hub\master.zip
Rotation type : 6d
Reading camera 150/150
Loading Training Cameras
Loading Test Cameras
Number of points at initialisation :  19361
Train images:  131
['000401', '000403', '000405', '000407', '000411', '000413', '000415', '000417', '000419', '000421', '000423', '000427', '000429', '000431', '000433', '000435', '000437', '000439', '000443', '000445', '000447', '000449', '000451', '000453', '000455', '000459', '000461', '000463', '000465', '000467', '000469', '000471', '000475', '000477', '000479', '000481', '000483', '000485', '000487', '000491', '000493', '000495', '000497', '000499', '000501', '000503', '000507', '000509', '000511', '000513', '000515', '000517', '000519', '000523', '000525', '000527', '000529', '000531', '000533', '000535', '000539', '000541', '000543', '000545', '000547', '000549', '000551', '000555', '000557', '000559', '000561', '000563', '000565', '000567', '000571', '000573', '000575', '000577', '000579', '000581', '000583', '000587', '000589', '000591', '000593', '000595', '000597', '000599', '000603', '000605', '000607', '000609', '000611', '000613', '000615', '000619', '000621', '000623', '000625', '000627', '000629', '000631', '000635', '000637', '000639', '000641', '000643', '000645', '000647', '000651', '000653', '000655', '000657', '000659', '000661', '000663', '000667', '000669', '000671', '000673', '000675', '000677', '000679', '000683', '000685', '000687', '000689', '000691', '000693', '000695', '000699']
Using cache found in C:\Users\PC21/.cache\torch\hub\intel-isl_MiDaS_master
Using cache found in C:\Users\PC21/.cache\torch\hub\intel-isl_MiDaS_master
D:\Research\CF3DGS\CF-3DGS\trainer\trainer.py:493: DeprecationWarning: Since kornia 0.7.0 the `depth_to_3d` is deprecated in favor of `depth_to_3d_v2`. This function will be replaced with the `depth_to_3d_v2` behaviour, where the that does not require the creation of a meshgrid. The return shape can be not backward compatible between these implementations.
  pts = depth_to_3d(depth_tensor[None, None],
Number of points at initialisation :  272338
optimizing frame 000
Training progress:   0%|                                                                      | 0/1000 [00:00<?, ?it/s]
willyawan16 commented 1 month ago

After tracing the code, I found out that it fails to retrieve the Tensor in self.P which is a list[LieGroupParameter] Everytime the code needs access to LieGroupParameter, it is suddenly dumped and no output is shown. Any solution to this? 1 2 3 inside render function, 4 5

Wang-Chbo commented 1 month ago

Have you fix it? I get the same bug

OasisYang commented 1 month ago

Does this problem occur only with Tanks/Francis? Can you print self.P or check self.seq_len?

willyawan16 commented 1 month ago

It occurs to all dataset (including the one that is provided Tanks and CO3D) printing self.seq_len is not a problem the problem lies in the self.P, when I try to print the self.P the code appears to stopped immediately Here I print them in the function init_RT_seq() which is called from the init_two_view() function

Snipaste_2024-07-15_13-53-11

OasisYang commented 1 month ago

I guess this error comes from the failed installation of Lietorch. You can check its official repo to see if you can run the provided simple examples.

willyawan16 commented 1 month ago

@OasisYang When I try the test examples provided by Lietorch, it comes out that the immediate stop also happens when "Testing lietorch forward pass (GPU)". I guess that it has something to be done with memory leak. May I ask what is your computer specification that you use to run this code? I would like to compare our computer specs. And also what OS do you use? Thank you My computer spec is as follows: OS Win10 CPU i7-9700 GPU RTX 2080 RAM 32 GB

iampalop commented 1 month ago

I get the same issue. Have you solved it yet?

JHXion9 commented 3 weeks ago

I reinstall the Lietorch and solve the problem. the Eigen eigen-824272cde8ca2541e8b67b0887f5ded92b128d1f.zip . Besides, run instruction i used is as follow: python run_cf3dgs.py -s ./data/cat/ --mode train --data_type custom --depth_model_type depth_anything