The code was using 2 GPUs for testing, when I ran the code as is, I got this error:
(ViP_NeRF_GPU) [kapilc@eceaiws src]$ python NerfLlffTrainerTester01.py
Program started at 01/09/2023 12:14:41 PM
Loading visibility prior mask: ../data/databases/NeRF_LLFF/data/all/visibility_prior/VW02/fern/visibility_masks/0006_0013.png
Loading visibility prior mask: ../data/databases/NeRF_LLFF/data/all/visibility_prior/VW02/fern/visibility_masks/0013_0006.png
Training 11/fern begins...
Resuming Training from iteration 50001
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50000/50000 [00:00<?, ?it/s]
Loaded Model in train0011/fern/Model_Iter050000 trained for 50000 iterations
fern: 0%| | 0/5 [00:00<?, ?it/s]
module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu
Traceback (most recent call last):
File "NerfLlffTrainerTester01.py", line 993, in <module>
main()
File "NerfLlffTrainerTester01.py", line 980, in main
demo1a()
File "NerfLlffTrainerTester01.py", line 348, in demo1a
start_testing(test_configs)
File "NerfLlffTrainerTester01.py", line 101, in start_testing
Tester.start_testing(test_configs, scenes_data, save_depth=True, save_depth_var=True, save_visibility=True)
File "/home/kapilc/HARSHA/ViP-NeRF/src/Tester01.py", line 210, in start_testing
predictions = tester.predict_frame(tgt_pose, view_tgt_pose, secondary_poses,
File "/home/kapilc/HARSHA/ViP-NeRF/src/Tester01.py", line 63, in predict_frame
output_dict = self.model(input_dict, sec_views_vis=secondary_poses is not None)
File "/home/kapilc/Softwares/Anaconda/anaconda3/envs/ViP_NeRF_GPU/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kapilc/Softwares/Anaconda/anaconda3/envs/ViP_NeRF_GPU/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 154, in forward
raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu
Program ended at 01/09/2023 12:14:45 PM
Execution time: 0:00:04.746777
But when I ran the testing with 1 GPU only by changing 'device': [0, 1] to 'device': [0], it works and I get test results. Another thing I noticed, the training works with 2 GPUs, but not the testing.
The code was using 2 GPUs for testing, when I ran the code as is, I got this error:
But when I ran the testing with 1 GPU only by changing
'device': [0, 1]
to'device': [0]
, it works and I get test results. Another thing I noticed, the training works with 2 GPUs, but not the testing.