Error on training with my own data

flamorim commented 1 year ago

First, congrats for the excelent job!!

I could do the training using 'cozy2room-linear' data but I'm facing the following erro when running with my our data:

python train.py --config configs/Marco.txt spline numbers: 7 Mismatch between imgs 34 and poses 25 !!!! Loaded image data (720, 1280, 3, 34) [ 720. 1280. 1022.32821729] Loaded ./data/nerf_llff_data/Marco 8.838927097722193 72.43632689966341 Pose State: None Loaded llff torch.Size([34, 720, 1280, 3]) torch.Size([120, 3, 5]) tensor([ 720.0000, 1280.0000, 1022.3282]) ./data/nerf_llff_data/Marco DEFINING BOUNDS NEAR FAR 0.0 1.0 Begin TRAIN views are tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]) TEST views are tensor([100]) VAL views are tensor([100]) 0%| | 0/200002 [00:00<?, ?it/s] ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [96,0,0] Assertion index >= -sizes[i]&& index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [97,0,0] Assertion index >= -sizes[i]&& index < sizes[i] && "index out of bounds" failed. ... ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [94,0,0] Assertion index >= -sizes[i]&& index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [95,0,0] Assertion index >= -sizes[i]&& index < sizes[i] && "index out of bounds" failed. 0%| | 0/200002 [00:00<?, ?it/s] Traceback (most recent call last): File "/src/BAD-NeRF/train.py", line 324, in train() File "/src/BAD-NeRF/train.py", line 161, in train ret, ray_idx, spline_poses = graph.forward(i, img_idx, poses_num, H, W, K, args) File "/src/BAD-NeRF/nerf.py", line 191, in forward spline_poses = self.get_pose(i, img_idx, args) File "/src/BAD-NeRF/optimize_pose_linear.py", line 50, in get_pose spline_poses = Spline.SplineN_linear(se3_start, se3_end, pose_nums, args.deblur_images) File "/src/BAD-NeRF/Spline.py", line 186, in SplineN_linear pos_0 = torch.where(pose_time == 0) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Could someone help me find the solution? The "poses_bounds.npy" file was generated by LLFF code.

Thanks in advance, Flávio.

wangpeng000 commented 1 year ago

Hello, I think these errors are caused by the mismatch number between your training images and corresponding camera poses. In addition, you can read "Your own data" part in my README.md for the detailed understanding, especially for the parameter novel_view.

flamorim commented 1 year ago

Hi Wang, I could solve the problem replacing my images, they probably had excessive blur. Thanks very much!

WU-CVGL / BAD-NeRF

Error on training with my own data #4