SJoJoK / VGOS

[IJCAI 2023] Official code for the paper "VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs".
Other
21 stars 2 forks source link

Error when running training example #1

Closed jpsml closed 9 months ago

jpsml commented 9 months ago

When I run the following command:

python run.py --config configs/llff/room.py --render_test_get_metric

I get the error below:

scene_rep_reconstruction (fine): iter 2000 / Loss: 0.004814889 / PSNR: 26.94 / Eps: 00:00:37
7%|██████████▋ | 1994/30000 [00:36<05:24, 86.27it/sTesting (756, 1008, 3) | 0/2 [00:00<?, ?it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.61it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.60it/sTesting (756, 1008, 3) | 0/6 [00:00<?, ?it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:19<00:00, 3.31s/it] 7%|██████████▋ | 1999/30000 [01:03<14:44, 31.64it/s] Traceback (most recent call last): File "run.py", line 1425, in train(args, cfg, data_dict,writer) File "run.py", line 1246, in train scene_rep_reconstruction( File "run.py", line 1152, in scene_rep_reconstruction HW=HW[i_random_val], IndexError: index 61 is out of bounds for axis 0 with size 41 wandb: Waiting for W&B process to finish... (failed 1). ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [84,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [85,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [86,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [87,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [88,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [89,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [90,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [91,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [92,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [93,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [94,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

Any clue of what is going wrong?

SJoJoK commented 9 months ago

Hello @jpsml , thanks for raising the issue! The log indicates that the index of the chosen image for validation is out of bounds for axis 0 with a size of 41. This part of the code was used for specific purposes, and I hardcoded the indexes for convenience during my experiments. As a temporary workaround, you can train with --i_random_val 100000 to bypass this process. I will release an updated version to address this issue soon.