ShijieZhou-UCLA / feature-3dgs

[CVPR 2024 Highlight] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Other
310 stars 21 forks source link

Why dL_dfeaturechannel do not contribut to the calculation of dL_dalpha? #4

Closed ShaohuaL closed 4 months ago

ShaohuaL commented 6 months ago

I've been reviewing the implementation details within the backward.cu file and noticed that the color components contribute to dL_dalpha. However, I'm puzzled as to why the alpha-rendered feature maps do not contribute to dL_dalpha in a similar manner. Given that both color and feature maps utilize alpha rendering, I expected a consistent treatment regarding their contribution to the gradient calculations.

Could you please clarify the design decision behind this? Is there a specific reason why alpha-rendered feature maps are excluded from contributing to dl_dalpha, or might this be an oversight in the implementation?

    for (int ch = 0; ch < C; ch++)
        {
            const float c = collected_colors[ch * BLOCK_SIZE + j];
            // Update last color (to be used in the next iteration)
            accum_rec[ch] = last_alpha * last_color[ch] + (1.f - last_alpha) * accum_rec[ch];
            last_color[ch] = c;

            const float dL_dchannel = dL_dpixel[ch];
            dL_dalpha += (c - accum_rec[ch]) * dL_dchannel;
            // Update the gradients w.r.t. color of the Gaussian. 
            // Atomic, since this pixel is just one of potentially
            // many that were affected by this Gaussian.
            atomicAdd(&(dL_dcolors[global_id * C + ch]), dchannel_dcolor * dL_dchannel);
        }

        for (int ch = 0; ch < NUM_SEMANTIC_CHANNELS; ch++) 
        {
            const float f = collected_semantic_feature[ch * BLOCK_SIZE + j];
            // Update last semantic feature (to be used in the next iteration)
            accum_semantic_feature_rec[ch] = last_alpha * last_semantic_feature[ch] + (1.f - last_alpha) * accum_semantic_feature_rec[ch];
            last_semantic_feature[ch] = f;

            const float dL_dfeaturechannel = dL_dfeaturepixel[ch];
            // Update the gradients w.r.t. semnatic feature of the Gaussian. 
            // Atomic, since this pixel is just one of potentially
            // many that were affected by this Gaussian.
            atomicAdd(&(dL_dsemantic_feature[global_id * NUM_SEMANTIC_CHANNELS + ch]), dchannel_dsemantic_feature * dL_dfeaturechannel); 
        }
yihua7 commented 5 months ago

I hold that it's because the paper focuses more on distilling features from fundamental models to GS rather than improving GS via feature distilling.

keloee commented 5 months ago

I am curious why the rendering features that do not contribute to the alpha gradient, can improve the rendering effect in Table 1.

ShaohuaL commented 5 months ago

I am curious why the rendering features that do not contribute to the alpha gradient, can improve the rendering effect in Table 1.

Good question, I think the opacity here should be mainly service geometry, if rendering features do contribute to the alpha gradient,, the effect may be uncertain, even more worst.

keloee commented 5 months ago

The authors mentioned in Section 3.2

Our equal-weighted joint optimization approach has demonstrated that the resulting high-dimensional semantic features significantly contribute to scene understanding and enhance the depiction of physical scene attributes, such as opacity and relative positioning. See the comparison between Ours and Base 3DGS in Tab. 1.

If the rendering features do not contribute to the alpha gradient, the rendering result should be the same as a plain 3DGS. I am wondering if this is a mistake made in the implementation. Also it is mentioned in the NeRF-DFF paper, the multi-view inconsistency of features produced by foundation models could harm geometry, so the authors stop the gradient backpropagation to density values. This seems to be inline with the implementation but not the paper. Could you please tell me if I understand the paper wrong?

ShijieZhou-UCLA commented 4 months ago

Hi all! Thank you so much for bringing this up! We do have tried different versions of our code in our experiments. One thing that I would like to clarify is that the purpose of Table 1 in our paper is to show that our proposed explicit feature field distillation will not affect the quality of radiance field rendering, but NOT to improve the performance of novel view synthesis.

Also, given the substantial randomness due to GPU scheduling mentioned here. For Table 1, we redo our experiments by running them 5 times for each setting, excluding the max and min, and calculating the average from the remaining 3 groups. As a result, we can still observe ~0.8 PSNR improvement, but we don't claim this improvement as our contribution in this paper.

As for dL_dalpha in feature rendering, we have already updated this part in our code base. Sorry for the misleading initial version. For those who are curious about the influence of this change, I am happy to provide the experimental results (PSNR) for this ablation on Replica dataset:

           dL_dalpha              No dL_dalpha
Room 0      36.311                  36.326  
Room 1      38.515                  38.514
Office 3    38.383                  38.473
Office 4    34.851                  34.174
Average     37.015                  36.872
ShijieZhou-UCLA commented 1 month ago

Just wanted to follow up on this, I have disabled the contribution of dL_dfeaturechannel to dL_dalpha in the updated code base. It is verified from my experiments that only semantic-meaning features (e.g. CLIP-LSeg, SAM) will benefit / not affect the RGB rendering, however, non-semantic-meaning features (e.g. from your self-designed encoder) will destroy the radiance field rendering if we keep this enabled. Therefore, for more general use of our code for any high-dimensional feature rendering, this line of code is commented out:

https://github.com/ShijieZhou-UCLA/feature-3dgs/blob/main/submodules/diff-gaussian-rasterization-feature/cuda_rasterizer/backward.cu#L572