Closed JiuTongBro closed 3 months ago
Hi, I'm not sure what the reason is. What happens if you turn off the SDS loss first, and then supervise the unmasked regions with just pixel color? Let's first determine if this is a limitation of the NeRF representation, or the SDS loss.
Hi, Thanks for your advice!
I tested the model with only reconstruction loss enabled on unmasked pixels. The code is modified to:
############## add guidance loss for all pixels
if args.first_stage and use_batching: # actually, we do not need this step
optimizer.zero_grad()
img_loss = img2mse(rgb, target_clf)
depth_loss = 0.0
# only for uninpainted regions
if args.depth_loss and not args.colmap_depth:
depth_loss = img2mse(disp, target_inp)
# print('------depth_loss: ', depth_loss)
loss = img_loss + args.depth_lambda * depth_loss
else:
optimizer.zero_grad()
# compute the unmasked RGB loss
loss = img2mse(rgb2, target_clf)
if 'rgb0' in extras2 and not args.no_coarse:
img_loss0 = img2mse(extras2['rgb0'], target_clf)
loss = loss + img_loss0
# print('---------------loss: ', loss)
loss.backward()
optimizer.step()
However, the results turn to be a little clearer, as shown in the following figure. The first row is the results run with the full model, the second row is the results run with only the reconstruction loss. It seems it is not totally caused by NeRF. (Our mask is generated from a 3D bounding box, so the masked area looks like a cube.)
However, your suggestion is feasible, I will next enable the depth loss, and then the SDS loss, to see what happens. Thanks for your suggestion! I will report it later when I find the reason.
Hi. I tested the model by using:
The results are shown below, the second row of each figure is the result using only the Reconstruction Loss.
Both the results turn to be blur, no matter depth loss is added or SDS loss is added.
So I wonder:
(1) How do you obtain the GT depth for each scene? I extracted them from a pre-trained Gaussian Splatting, the I processed the depth to density, as it is done in your code:
disp_map = 1. / torch.max(1e-10 * torch.ones_like(depth), depth)
(2) Do you have any idea about, why the background turn to be blur after training with depth loss or SDS?
Thanks.
Hi,
(1) if the original dataset does not provide the depth map, we use the pseudo-depth like SPIn-NeRF. However, we only apply depth reconstruction loss on the unmasked regions. I think the depth you extracted from GS is also okay. Besides, in our test, depth SDS may fail (see the appendix of our paper). (2) How about using a small RGB SDS weight?
Thanks for your suggestion! We didn't use the depth SDS for supervision. Following your code release, we use the normal SDS. I will try reduce the RGB SDS weight.
All right... I recently discovered that there were some issues with the setup of my previous experiments. In fact, this ambiguity could likely be attributed to the inherent limitations of the Nerf representation itself.
We prepared two sets of training data based on Mip-NeRF 360: one where we retained training views within a narrow 30-degree range around the frontal angle (6-8 images per scene, similar to the viewing range of the SPin-NeRF dataset); and another with a broader scope, preserving training views within a 120-degree frontal angle (50-60 images per scene, significantly exceeding the viewing range of the SPin-NeRF dataset).
It appears that our prior experimental results may have inadvertently mixed these two datasets.😠Therefore, I conducted separate experiments on each dataset, utilizing only the reconstruction loss:
The results indicate that when the training view is limited to 30 degrees, the reconstruction results are notably clear. However, when expanding the training view to 120 degrees, the results become considerably blurrier, particularly for elements like grass, distant shrubs, and tree leaves. So, the aforementioned blurriness likely stems from the inherent limitations of vanilla Nerf in effectively reconstructing scenes with extensive observation angles.
Hi. Thanks for your impressive work! But I wonder have you run your model on the Mip-NeRF 360 Dataset?
I conducted some experiments on the Mip-NeRF 360 Dataset. However, the results in both foreground and background seems to be blur. Do you have any idea about why it is blur?
Is it caused by the limitation of NeRF representation itself? I have already pruned the training view to be within a 120-degree cone, so it shall also be, a facing-forward dataset now. Do you have any idea to solve this problem?
BTW, is the coordinate system used in the
poses_bounds.npy
different from the original COLMAP-estimated coordinates? After training the scenes, I used a sequence of testing cameras to render a video. This camera poses sequence ran successfully in Gaussian Splatting, I directly extracted them, but failed here.Here are my results. They are much more blur than the Gaussia Splatting reconstructed images. Thanks!