Closed ShaohuaL closed 4 months ago
I hold that it's because the paper focuses more on distilling features from fundamental models to GS rather than improving GS via feature distilling.
I am curious why the rendering features that do not contribute to the alpha gradient, can improve the rendering effect in Table 1.
I am curious why the rendering features that do not contribute to the alpha gradient, can improve the rendering effect in Table 1.
Good question, I think the opacity here should be mainly service geometry, if rendering features do contribute to the alpha gradient,, the effect may be uncertain, even more worst.
The authors mentioned in Section 3.2
Our equal-weighted joint optimization approach has demonstrated that the resulting high-dimensional semantic features significantly contribute to scene understanding and enhance the depiction of physical scene attributes, such as opacity and relative positioning. See the comparison between Ours and Base 3DGS in Tab. 1.
If the rendering features do not contribute to the alpha gradient, the rendering result should be the same as a plain 3DGS. I am wondering if this is a mistake made in the implementation. Also it is mentioned in the NeRF-DFF paper, the multi-view inconsistency of features produced by foundation models could harm geometry, so the authors stop the gradient backpropagation to density values. This seems to be inline with the implementation but not the paper. Could you please tell me if I understand the paper wrong?
Hi all! Thank you so much for bringing this up! We do have tried different versions of our code in our experiments. One thing that I would like to clarify is that the purpose of Table 1 in our paper is to show that our proposed explicit feature field distillation will not affect the quality of radiance field rendering, but NOT to improve the performance of novel view synthesis.
Also, given the substantial randomness due to GPU scheduling mentioned here. For Table 1, we redo our experiments by running them 5 times for each setting, excluding the max and min, and calculating the average from the remaining 3 groups. As a result, we can still observe ~0.8 PSNR improvement, but we don't claim this improvement as our contribution in this paper.
As for dL_dalpha in feature rendering, we have already updated this part in our code base. Sorry for the misleading initial version. For those who are curious about the influence of this change, I am happy to provide the experimental results (PSNR) for this ablation on Replica dataset:
dL_dalpha No dL_dalpha
Room 0 36.311 36.326
Room 1 38.515 38.514
Office 3 38.383 38.473
Office 4 34.851 34.174
Average 37.015 36.872
Just wanted to follow up on this, I have disabled the contribution of dL_dfeaturechannel to dL_dalpha in the updated code base. It is verified from my experiments that only semantic-meaning features (e.g. CLIP-LSeg, SAM) will benefit / not affect the RGB rendering, however, non-semantic-meaning features (e.g. from your self-designed encoder) will destroy the radiance field rendering if we keep this enabled. Therefore, for more general use of our code for any high-dimensional feature rendering, this line of code is commented out:
I've been reviewing the implementation details within the backward.cu file and noticed that the color components contribute to dL_dalpha. However, I'm puzzled as to why the alpha-rendered feature maps do not contribute to dL_dalpha in a similar manner. Given that both color and feature maps utilize alpha rendering, I expected a consistent treatment regarding their contribution to the gradient calculations.
Could you please clarify the design decision behind this? Is there a specific reason why alpha-rendered feature maps are excluded from contributing to dl_dalpha, or might this be an oversight in the implementation?