autonomousvision / gaussian-opacity-fields

Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes
https://niujinshuchong.github.io/gaussian-opacity-fields/
Other
559 stars 26 forks source link

Why are the images rendered by renderCUDA and integrateCUDA different? #38

Closed will-zzy closed 3 weeks ago

will-zzy commented 1 month ago

Thanks for such a wonderful work! I found the images rendered from renderCUDA and integrateCUDA are different: rendered from renderCUDA: test_0_rgb rendered from integrateCUDA: integrate_clamp_0_rgb

However, the processes of the two methods appear to be the same. I want to know why this happen? Besides, I want to know have you tested your mesh extraction method on 2DGS? Thank you very much!

will-zzy commented 1 month ago

I identified a bug: in forward.cu, lines 993, 1002, 1031, and 1035, T should be corner_Ts[0]. After correcting this in the code, the results of the two functions became consistent.

niujinshuchong commented 1 month ago

Hi, thanks for reporting the bug. It is fixed in the latest commits.

I haven’t test it on 2DGS but I think GOF can also be applied to 2DGS. We just need to use a step function (where the opacity change at the intersection depth between ray and 2D gaussian for a ray) to define the opacity field.

will-zzy commented 1 month ago

Hi, thanks for reporting the bug. It is fixed in the latest commits.

I haven’t test it on 2DGS but I think GOF can also be applied to 2DGS. We just need to use a step function (where the opacity change at the intersection depth between ray and 2D gaussian for a ray) to define the opacity field.

Thanks for your reply! I just have tested it on 2DGS, but the test results are not very satisfactory, which I believe is primarily due to the challenge you mentioned about defining the opacity of free space points in 2DGS. 2dgs_opacity_field

Since 2DGS lacks a normal scale, this definition becomes quite difficult. Additionally, the geometric meaning of $n_i$ is somewhat confusing to me. Given that $r_g$ is already the view ray in the local Gaussian space, what is the implication of multiplying $r_g$ by $diag(s^{-1})$? Thank you!

niujinshuchong commented 1 month ago

Hi, thanks for sharing the results. Could you be more specific for what you did? Are you using the pretrained model of 2DGS to extract the mesh you show above? What do you mean by DGS lacks a normal scale?

Regarding normal in GOF, r_g is the normal of the intersection in the local Gaussian coordinate system and we need to transform it back to world space. To do so, we first need to recover the scale by multiplying with s^-1 and then multiple with the gaussian to world rotation matrix to the world space.

will-zzy commented 1 month ago

Hi, thanks for sharing the results. Could you be more specific for what you did? Are you using the pretrained model of 2DGS to extract the mesh you show above? What do you mean by DGS lacks a normal scale?

Regarding normal in GOF, r_g is the normal of the intersection in the local Gaussian coordinate system and we need to transform it back to world space. To do so, we first need to recover the scale by multiplying with s^-1 and then multiple with the gaussian to world rotation matrix to the world space.

Of course, since extracting the opacity field is a post-processing procedure, I extract the opacity field from a trained 2DGS model. The process is similar to that of GOF, and in equation 8.5, when the intersection of the Gaussian and the ray, i.e. $t^*$ , occurs before $t$, the weight of this Gaussian for point $t$ is $G(t^*)$. However, when $t^*$ is after $t$, the splat's weight for point $t$ is $G(t)$. Defining the Gaussian values for points not on the splat plane is quite challenging for 2DGS, as the lack of scale in the normal direction makes it difficult to measure distances in the normal direction into the Gaussian local space.

Here, I first follow the 2DGS method by transforming the homogeneous coordinates of the x-plane/y-plane under the uv coordinate system, i.e., kk and ll, which represent the homogeneous coordinates of the intersection lines of the u-plane/v-plane with the 2DGS plane, disregarding the normal direction. $p = \text{cross}(kk, ll)$ approximately represents the homogeneous coordinates of the pixel point in the uv coordinate system, so I treat p.z as the distance of the pixel point from the 2DGS plane (in the uv coordinate system).

Since the scale in the normal direction cannot be measured, I subtract $t/\sqrt{p.x, p.y}$ ​ from p.z as the distance from the query point to the 2DGS plane, and calculate the distance from the query point to the centroid of 2DGS using the Pythagorean theorem.

Code show as below:

for (int j = 0; !second_done && j < min(BLOCK_SIZE, toDo); j++) 
            // 再次遍历所有叠在该tile上的高斯,每次迭代,更新该像素对应查询点的opacity值
            {
                num_iterated++;

                if (num_iterated > last_contributor){ 
                    second_done = true;// 若已经遍历完了 参与第一轮渲染的所有高斯,则done
                    continue;
                }
                if (num_iterated != (u_int32_t)contributed_ids[num_contributed_second]){
                    // 若当前高斯不与当前像素射线相交则跳过
                    continue;
                } else{
                    num_contributed_second += 1;
                }

                // float* view2gaussian_j = collected_view2gaussian + j * 16;
                // float3 scale_j = collected_scale[j];
                float3 Tu = collected_Tu[j];
                float3 Tv = collected_Tv[j];
                float3 Tw = collected_Tw[j];
                float4 norm_o = collected_normal_opacity[j];
                float2 xy = collected_gs_xy[j]; // 

                // iterate over all projected points

                for (int k = 0; k < num_projected; k++){ // 对于每一个高斯,计算当前射线上的所有三维投影点的opacity
                    // create the ray

                    float2 pix_local = {projected_xy[k].x, projected_xy[k].y}; // 查询点像素坐标
                    float2 d = {xy.x - pix_local.x, xy.y - pix_local.y};

                    float3 kk = {-Tu.x + pix_local.x * Tw.x, -Tu.y + pix_local.x * Tw.y, -Tu.z + pix_local.x * Tw.z};
                    float3 ll = {-Tv.x + pix_local.y * Tw.x, -Tv.y + pix_local.y * Tw.y, -Tv.z + pix_local.y * Tw.z};
                    // cross product of two planes is a line (i.e., homogeneous point), See Eq. (10)
                    float3 p = crossProduct(kk, ll); // 在高斯局部坐标系中,像素点

                    float2 s = {p.x / p.z, p.y / p.z}; // splat局部坐标
                    float rho3d = (s.x * s.x + s.y * s.y); 

                    float ray_depth = projected_depth[k];

                    float rho2d = FilterInvSquare * (d.x * d.x + d.y * d.y); 

                    // float t = (rho3d <= rho2d) ? (s.x * Tw.x + s.y * Tw.y) + Tw.z : Tw.z; // splat depth
                    float t = (s.x * Tw.x + s.y * Tw.y) + Tw.z; // splat depth
                    bool flag = false;
                    if (t > ray_depth){  
                        // 公式8.5,射线上的待查询点k的opacity为:所有对该像素有贡献的高斯j:在k之后的,用k的距离算j的alpha,并累加alpha * T作为k的opacity

                        t = ray_depth;
                        flag = true;
                        // continue;
                    }

                    float rho = min(rho3d, rho2d);
                    // float rho = rho3d;
                    if (flag){

                        // float3 kk = {-Tu.x + (t + pix_local.x) * Tw.x, -Tu.y + (t + pix_local.x) * Tw.y, -Tu.z + (t + pix_local.x) * Tw.z};
                        // float3 ll = {-Tv.x + (t + pix_local.y) * Tw.x, -Tv.y + (t + pix_local.y) * Tw.y, -Tv.z + (t + pix_local.y) * Tw.z};
                        float3 kk = {-Tu.x + pix_local.x * Tw.x, -Tu.y + pix_local.x * Tw.y, -Tu.z + pix_local.x * Tw.z};
                        float3 ll = {-Tv.x + pix_local.y * Tw.x, -Tv.y + pix_local.y * Tw.y, -Tv.z + pix_local.y * Tw.z};
                        float3 p = crossProduct(kk, ll); 
                        p.z -= t / sqrt(p.x * p.y);
                        // p.z *= sqrt(p.x * p.y);
                        rho += p.z * p.z;
                    }
                    float power = -0.5f * rho;

                    float alpha = min(0.99f, norm_o.w * exp(power));

                    // TODO check here
                    if (alpha < 1.0f / 255.0f){
                        continue;
                    }

                    float test_T = point_Ts[k] * (1 - alpha);

                    point_alphas[k] += alpha * point_Ts[k];

                    point_Ts[k] = test_T;
                }
            }

Thanks for your answer! However, according to Equation (3), should $r_g$ ​ not represent the line of sight direction, i.e. the representation of $r$ in the Gaussian coordinate system, or is my understanding incorrect? In fact, after the transformation by Equation (3), has $r_g$ ​ become parallel to one of the Gaussian axes?

niujinshuchong commented 1 month ago

Hi, defining the opacity field for 2D Gaussian is simpler than what you implemented. I mean something like this image It is a step function that is zero before the intersection depth and then jumps to the opacity value evaluated at the 2D Gaussian for this ray.

This definition ignores the low filter in 2DGS and it might need to change accordingly for the low pass filter.

For the r_g in GOF, it is not parallel to the Gaussian axis. After applying back the scale, the ray direction and intersection plane is changed. However, the normal is always perpendicular to the intersection plane, illustrated in the following figure. image

will-zzy commented 1 month ago

Hi, defining the opacity field for 2D Gaussian is simpler than what you implemented. I mean something like this image It is a step function that is zero before the intersection depth and then jumps to the opacity value evaluated at the 2D Gaussian for this ray.

This definition ignores the low filter in 2DGS and it might need to change accordingly for the low pass filter.

For the r_g in GOF, it is not parallel to the Gaussian axis. After applying back the scale, the ray direction and intersection plane is changed. However, the normal is always perpendicular to the intersection plane, illustrated in the following figure. image

Thank you for your response! I now have a clear understanding of the geometric details involved in calculating normals. Indeed, the construction of the step function is crucial, and I've noticed that the smoothness at the step significantly impacts the quality of mesh reconstruction. If the step function is too sharp, for example, as seen in forward.cu line 1214 with the addition of continue, this results in the step function reaching its maximum value only when $t > t^*$ and being zero for $t \leq t^*$, like:

if (t > ray_depth){                     
    t = ray_depth;
    continue;
}

and the result: sharp_step_function

Additionally, for 2DGS, since the intersection of the ray with the 2DGS plane necessitates querying the Gaussian weight for points before the intersection outside the plane, the scale information on the normal is required. Using merely the Euclidean distance from the query point to the Gaussian centroid for measurement can make the step function overly smooth, leading to a decline in mesh quality. Moreover, the fewer number of points in 2DGS compared to GOF (i.e., 1,000,000 vs. 2,500,000) also appears to be a critical factor affecting the quality of the mesh.

hbb1 commented 1 month ago

@will-zzy I believe that the proposed technique can be applied to 2DGS as well. There is no evidence that the GOF needs a 3D blob (nor did the paper uses any regularization encouraging a 3D blob representation), so it must be applied to general Gaussians. From my side, I think of it as a space carving algorithm. Each vertex's scalar value (either sdf or opacity) can be determined through min, while in the TSDF fusion it is determined through weighting sum.

will-zzy commented 1 month ago

@will-zzy I believe that the proposed technique can be applied to 2DGS as well. There is no evidence that the GOF needs a 3D blob (nor did the paper uses any regularization encouraging a 3D blob representation), so it must be applied to general Gaussians. From my side, I think of it as a space carving algorithm. Each vertex's scalar value (either sdf or opacity) can be determined through min, while in the TSDF fusion it is determined through weighting sum.

Thank you for your works and response! My understanding is that the opacity value at each tetrahedron vertex is derived from one-dimensional step functions calculated along multiple rays passing through that vertex. However, I encountered a challenge in implementing this on the 2DGS: when a ray intersects the 2DGS plane, a one-dimensional step function must be defined along that ray. 2dgs

In the computation for 2DGS, the homogeneous coordinates $h_u$ under the 2DGS coordinate system represent the x-plane, or u-plane (a three-dimensional plane in homogeneous coordinates). Since $h_u.z=0$ implies that the component of the plane's normal along the z-axis (perpendicular to the uv plane) is zero, the intersection line of $h_u$ and $h_v$ must be perpendicular to the 2DGS plane. Defining the weights for points not on the 2DGS plane becomes critical at this juncture, and the lack of scale information on the normal complicates this. Using the Euclidean distance in the camera coordinate system as an approximation has resulted in outcomes as shown on truck. (This result may also be influenced by the fewer number of points in 2DGS compared to GOF; in the truck scene, for instance, 2DGS has about 1,000,000 points while GOF has about 2,500,000.)

The step function could be defined such that, along the ray's path: if a point is before the intersection with the splat, its weight is zero; if a point is after the intersection, its weight is that of the intersection point. However, I have found that such a sharp step function diminishes the quality of mesh reconstruction, a phenomenon also observed in GOF.

If there are inaccuracies in my understanding, please kindly provide guidance!

hbb1 commented 1 month ago

@will-zzy From what I understand, you can get a point's opacity value by comparing its depth to the median depth (where the accumulated opacity is 0.5). If the the point's depth is larger than the median depth along the ray, it is classified inside, otherwise outside.

More specifically, for the $i$-th view for fusion, we have its median depth map D_i, and any point $p$ under this view point can be denoted as $p_i(x_iz_i,y_iz_i,z_i)$

  1. $O(p_i) = (D_i(x_i,y_i) - z_i) > 0 ?$ $0$ else $1$ determine its inside and outside
  2. $O(p) = min_{i}O(p_i)$ We fuse all frames to get the opacities
  3. Perform DMTet and binary search.
hbb1 commented 1 month ago

@will-zzy Would this implementation mathematically equate to the step_function described above?

niujinshuchong commented 4 weeks ago

@will-zzy I think one simple solution is to convert the 2D Gaussians to 3D Gaussians by adding a small value at the third scale. The scale can be determined in a similar way of determining a 3D filter size in mip-splatting. You can find the code here: https://github.com/autonomousvision/gaussian-opacity-fields/blob/main/scene/gaussian_model.py#L202

After that, you can run GOF's mesh extraction.

will-zzy commented 3 weeks ago

@niujinshuchong @hbb1 Thank you for your responses! My questions have been resolved, so I will now close this issue.

niujinshuchong commented 3 weeks ago

@will-zzy How do you address it in the end? Could you share your results and solution?

will-zzy commented 3 weeks ago

@niujinshuchong Certainly! In my testing, using 2DGS with DMTet+ binary search consistently failed to yield satisfactory results, which I believe is related to the initial number of points. To illustrate this point, I conducted the following experiments with a pre-trained 2DGS (1,062,711 points) and GOF (2,506,796 points) under three configurations (hereafter referred to as the tetrahedral vertices used for querying SDF as query points): A. Utilizing the method mentioned by @hbb1 , where the pixel’s median depth/rendering depth, $D_m$ / $D_v$ ​ , is set such that if a query point is before the median/rendering depth, the opacity is 0; if it is after, the opacity is 1.

@will-zzy From what I understand, you can get a point's opacity value by comparing its depth to the median depth (where the accumulated opacity is 0.5). If the the point's depth is larger than the median depth along the ray, it is classified inside, otherwise outside.

More specifically, for the i-th view for fusion, we have its median depth map D_i, and any point p under this view point can be denoted as pi(xizi,yizi,zi)

  1. O(pi)=(Di(xi,yi)−zi)>0? 0 else 1 determine its inside and outside
  2. O(p)=miniO(pi) We fuse all frames to get the opacities
  3. Perform DMTet and binary search.

Since the rendering depth results were better than the median depth in both methods, we will only present the outcomes using rendering depth here. 2DGS: 2DGS_binbin_way GOF: GOF_binbin_way This approach yielded suboptimal results for both 2DGS and GOF. I believe that while this method shares principles with TSDF, it is limited by the number of available sampling points, resulting in an insufficient spatial sampling rate.

B. Employing GOF’s opacity estimation method, for 2DGS: when calculating the opacity for the j-th 2DGS $G_j$​ at a query point, the Euclidean distance/distance scaled by 2DGS scale/distance scaled by 3D filter between query point and 2DGS is used. These three metrics resulted in meshes of similar quality, characterized by rough triangular facets, which I believe relates to the initial tetrahedral vertices. 2DGS: 2DGS_3dfilter_distance

C. Using 2DGS points as sampling points (with rotation and scale also from 2DGS, and the normal scale is set to 0.2), extracting opacity in GOF's Gaussian spheres, and performing DMTet+ binary search (integrating using all of GOF's Gaussians). GOF with 2DGS init points: GOF_2dgs_init_points

GOF with GOF inti poinsts: GOF_origin It is evident that using initial 2DGS points and extracting mesh with GOF's Gaussians also results in similar large facet issues. Therefore, my conclusion is that for DMTet+ binary search, the number of initial points (spatial sampling rate) significantly impacts the quality of the extracted mesh.

niujinshuchong commented 3 weeks ago

@will-zzy Thanks for sharing the results. The initial points definitely matters and therefore we use the center point of gaussian (which has maximal opacity) and corner points at 3-scaled (which has minimal opacity).