[Question about jointly the complete representation]

quan5609 commented 3 months ago

Dear authors,

Thanks for your impressive work! In the "Full Pipeline" section, you mentioned that MoSca performs "Initialize the dynamic Gaussians G and jointly optimize the complete representation including the static background Gaussians, dynamic foreground Gaussians, the deformation motion graph, and cameras".

Will you still optimize the camera poses after the BA process in section 3.5? Also, in section 3.5, you mention that "we jointly optimize a correction to depth Da[pa], consisting of per-frame global scaling factors and small per-pixel corrections". Can you provide more detailed descriptions for your depth alignment implementation and the camera poses training and representation?

JiahuiLei commented 3 months ago

Thanks for your questions, hope the following answers will be useful:

Will you still optimize the camera poses after the BA process in section 3.5?

Yes, the camera pose is optimized through the full photometric rendering as well. However because the GS renderer does not support gradient prop to the focal length at this moment, the intrinsic parameters are fixed after the BA stage.

Also, in section 3.5, you mention that "we jointly optimize a correction to depth Da[pa], consisting of per-frame global scaling factors and small per-pixel corrections". Can you provide more detailed descriptions for your depth alignment implementation and the camera poses training and representation?

The depth alignment Is mostly helpful in adjusting the per-frame global depth scale but we also add small depth corrections to each pixel to be more complete (as far as I observe, the small correction does not make too much difference so this part is not highlighted in the paper). Specifically, the actual depth is dep[t,i,j] = dep0[t,i,j] * scale[t] + delta_dep[t,i,j] where scale and delta_dep is optimizable and there is also a regularization to make the delta_dep absolute value small.

The camera pose is parameterized as small delta poses (quaternion and translation) between neighboring frames at the beginning and later converted to absolute poses to boost the optimization. The camera poses are also optimized in photometric rendering stages in later iterations.

quan5609 commented 3 months ago

Thank you. Your explanation is really helpful.

JiahuiLei / MoSca

[Question about jointly the complete representation] #1