Tangshitao / MVDiffusion

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, NeurIPS 2023 (spotlight)
447 stars 21 forks source link

Inquiries about Methodology in Paper #5

Open sangminkim-99 opened 11 months ago

sangminkim-99 commented 11 months ago

Hi, @Tangshitao!

I want to express my appreciation once again for sharing this exceptional work! I am really interested in fully understanding the paper and have a couple more questions.

  1. What do Q, K, and V represent in Equation 2?

Based on my understanding, the aim is to calculate cross-attention between overlapped frames. Therefore, the query would correspond to the source features, while the key and value would refer to the target features. However, I noticed that you explicitly denoted the features as $\bar{F}$. This has left me a bit confused about the nature of Q, K, and V.

  1. How were occlusions handled in the context of $K > 1$ in geometry-conditioned image generation?

In Figure 3, you presented the re-projection scheme used to obtain the source feature locations corresponding to their neighbors. However, in many cases, occlusions may occur, making it impossible to guarantee the presence of correspondences. While this may not be problematic in panorama generation where correspondences are assured, in the case of geometry-conditioned image generation with $K > 1$, handling occlusions becomes crucial. Have you explored scenarios where $K > 1$ in this context and discovered any solutions to address occlusion-related challenges?

Thank you once again for your time and for considering my questions. I am eager to gain a deeper understanding of your work.

Best regards, Sang Min Kim

Tangshitao commented 11 months ago
  1. it should be learnable weights of query, key and value.
  2. We have positional encoding to encode the depth check in geometry condition generation.

Thanks for the question. We will correct those in the paper. Please ask more if you feel something is unclear.

sangminkim-99 commented 11 months ago

Thank you for your kind reply!