Question about panorama generation at inference

Tangshitao / MVDiffusion

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, NeurIPS 2023 (spotlight)

447 stars 21 forks source link

Question about panorama generation at inference #3

Closed lukasHoel closed 1 year ago

lukasHoel commented 1 year ago

Thanks for sharing this interesting work! :) After reading about panorama generation in the paper, I have the following question:

How is the correspondence t^l between the frames (Fig. 3) obtained at inference/test-time? Is it given as an input in every denoising step?

Does this mean you do the following things at inference/test-time?

Take any existing panorama and project to 8 images
Compute correspondences between the 8 images
Use the correspondences to generate 8 new images starting from random noise with the generation module

Thank you for clarifying!

Tangshitao commented 1 year ago

The correspondences are the same for all the panorama. See the panorama projection matrix here, https://www.cambridgeincolour.com/tutorials/image-projections.htm

lukasHoel commented 1 year ago

I see, thanks for the link, now I understand the usage of a fixed homography matrix. May I ask a follow-up question: did you see limitations in fixing the correspondences, e.g. using always 8 frames with the same amount of overlap? Concretely, does it limit the layouts that can be generated in any way? Thanks again :)

Tangshitao commented 1 year ago

Can you clarify what layout is? Is it a global structure of room?

lukasHoel commented 1 year ago

Yes

Tangshitao commented 1 year ago

I didn't see any limitation of global structure. You can try our demo.