Sphere-like structure - Githubissues

jingma-git commented 3 years ago

Hi, this is an excellent paper with very neat implementation! Especially solve the pain to acquire ground truth signed distance field in 3D supervision such as IM-Net and OccNet, and can be applied to real-world image. But there still exist some artifacts. Here are my two problems:

Sphere-like structure especially for Pascal3D with othographic projection when the part is very thin, for example, the chair's leg. Figure 6 in paper. I guess the reason to generate this sphere-like structure is because you optimize the signed distance f(zu) to be D(u), formula (4) in paper and formula (13) in supplementary, these two formulas make the assumption that the predicted 3D signed distance should be like its 2D counterpart, which is a circle in 2D, a sphere in 3D. But when the part is very thin, meaning there are only few pixels in this region, the signed distance is actually dominated by these few pixels, and the sdf value is too small to be optimized, leading to artifacts like 糖葫芦(Tang Hu Lu). My questions are: are there other reasons resulting in the sphere-like artifacts? How to eliminate this artifact (detect the thin part by estimating the width of sdf field in the horizontal direction, assign 'focal weights' to the thin-part loss? not very sure...)
'Constrain z* to fall within the last two ray-marching steps by encouraging the Nth step to be negative and the first (N-1) steps to be positive', this sentence confuse me because it seems you assume there is only one intersection with the surface. Actually there should be two intersections, the first intersection with front surface, st. z(N-1)>0 and z(N)<0, and the second intersection with back surface, st. z(N-1)<0 and z(N)>0. There should be two places that 'sign changes' happen. I believe you already consider these two cases, see Line144 (side = ((y0<0)^(y2>0)）in implicit.py. But the sentence you write in the paper is still confusing, in my opinion, it should be 'encourage Z(N)>0 and z(N-1)<0 for backsurface intersection, and Z(N-?)<0 and Z(N-?-1) for frontsurface intersection, ?>=1 and also ? depends on the 'width of sdf field ' as mentioned before. Maybe I am wrong, hope to see your reply.

chenhsuanlin commented 3 years ago

Thanks for your interest! Regarding your questions:

My belief is that these axis-aligned artifacts (including the blob-like artifacts in the chair legs) come from the positional encoding component in NeRF. There are recent works like random Fourier features and SIREN that specifically discuss the drawbacks and improvements of this component. Although I haven't yet implemented them into this codebase, I believe they will be helpful in artifact reduction.
If you cast rays to sufficiently deep, then yes, there will be at least two surface intersections (note there may be more than two, and we don't know exactly how many). SDF-SRN does not aim to find all such zero-crossings, but only the surface intersection visible to the camera, so no matter how many zero-crossings a casted ray might meet, we only optimize w.r.t. the first one. We rely on learning (from other examples) to figure out what the back-sided surfaces would look like.

Hope this helps!

jingma-git commented 3 years ago

Thanks!

chenhsuanlin / signed-distance-SRN

Sphere-like structure #3