Aligning initial input image with produced mesh

Hello! Let me first say congrats and thanks for all your work, InstantMesh is absolutely amazing!

I am wondering about the possibility of aligning the initial input image with the produced mesh, specifically extracting the camera pose of the input image which could then be applied to the mesh such that it "looks the same". I know that One-2-3-45 has this capability (code here), and one of their multi-view diffusion-created images is a recreated version of the initial image at a guessed pose. I also believe that One-2-3-45 builds their poses relative to this initial image's pose calculation.

It seems like Zero123++ (which InstantMesh uses as the multi-view diffusion model) utilizes only the content of the initial image and doesn't bother figuring out a decent "absolute" pose of the initial image, but I haven't looked too closely into the Zero123++ code quite yet.

If InstantMesh doesn't immediately have this capability, could you point me in the general direction of a solution? Use the "elevation_estimate" as seen in One-2-3-45, with 0 degree azimuth (it looks like the 6 poses produced here are assuming initial azimuth being 0 degrees?)

Thanks for any help!

TencentARC / InstantMesh

Aligning initial input image with produced mesh #15