Open XinWu98 opened 3 years ago
Hi XinWu, 1) currently the code is not supported "spheric_poses=False" since the near-far boundary must align with the cost volume construction, the cost volume is built on the real-world coordinate, so you can't normal the near-fat boundary. 2) did you using a new loader or something? NaN values are generally caused by the bugs, 3) very close neighboring views can perform better. Thanks.
Hi XinWu, 1) currently the code is not supported "spheric_poses=False" since the near-far boundary must align with the cost volume construction, the cost volume is built on the real-world coordinate, so you can't normal the near-fat boundary. 2) did you using a new loader or something? NaN values are generally caused by the bugs, 3) very close neighboring views can perform better. Thanks.
Thanks for your reply! I will try the sampling strategy mentioned in 3). As for 2), I write a new class of Dataset refering to LLFFDataset in llff.py, and replace its images, poses and depth bounds with my own dataset. I will continue to check if there are any bugs or dirty data. However, my dataset is sampled from a video that records an real indoor scene from random viewpoints, rather than face forward like LLFF. Many views of training set may not be included in the cost volume built with 3 images, thus some 3D coordinates could not find corresponding volume feature . I wonder that did you take experiments in such situation? Do you think it may lead to the collapse of fine-tuning(e.g. NaN values)?
Hi XinWu, 1) currently the code is not supported "spheric_poses=False" since the near-far boundary must align with the cost volume construction, the cost volume is built on the real-world coordinate, so you can't normal the near-fat boundary. 2) did you using a new loader or something? NaN values are generally caused by the bugs, 3) very close neighboring views can perform better. Thanks.
Thanks for your reply! I will try the sampling strategy mentioned in 3). As for 2), I write a new class of Dataset refering to LLFFDataset in llff.py, and replace its images, poses and depth bounds with my own dataset. I will continue to check if there are any bugs or dirty data. However, my dataset is sampled from a video that records an real indoor scene from random viewpoints, rather than face forward like LLFF. Many views of training set may not be included in the cost volume built with 3 images, thus some 3D coordinates could not find corresponding volume feature . I wonder that did you take experiments in such situation? Do you think it may lead to the collapse of fine-tuning(e.g. NaN values)?
Hi, I met with the same NaN problem like you... Have you found the solution?
@Lemon-XQ and @XinWu98 I might have found the problem. I experienced the same error as you did when I trained a generalized MVSNeRF on my own dataset. I found the following line to be the root for the problem:
https://github.com/apchenstu/mvsnerf/blob/1fdf6487389d0872dade614b3cea61f7b099406e/utils.py#L129
what happens here is that the sampling points are transformed to the reference camera NDC coordinate system. However, in the case that there are big angles between the source and target cameras, it might happen (at random), that some sampling points have z==0 in the reference camera coordinate system. This causes a division by 0 -> ndc coordinate goes to infinity -> when using F.grid_sample this gives a nan raw output and everything goes to hell.
As a workaround, I added the following lines to https://github.com/apchenstu/mvsnerf/blob/1fdf6487389d0872dade614b3cea61f7b099406e/utils.py#L124
this should prevent the division by zero.
btw, if you use your own dataset, make sure to change the defaults for near and far in https://github.com/apchenstu/mvsnerf/blob/1fdf6487389d0872dade614b3cea61f7b099406e/utils.py#L112
they are not always adjusted according to your dataset specification!
Hi, thanks for releasing your work! I have some problems when fine-tuning on LLFF and my own dataset.
I try to fine-tuning on my own dataset, which is sparsely sampled from a real scene dataset and has more complex trajectory than LLFF. It reports the same error as below at the early fine-tuning stage. If it's not caused by numerical error, does it mean that your method is unsuitable for complex posed images of real scene? However, in my comprehension, such scenes should be trained as long time as Nerf in this situation, rather than report NaN when training, right?
Do you have any advice on how to choose source views?For example, should it be very close neighbors or uniformly distributed around the scene? How much co-visibility between source views is proper for your method?