cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
https://gaoruiyuan.com/magicdrive/
GNU Affero General Public License v3.0
664 stars 40 forks source link

The inputs of view-conditioned generation. #4

Closed Little-Podi closed 10 months ago

Little-Podi commented 10 months ago

Hi, congrats for your excellent work! I have a question regarding view-conditioned generation in Fig. 6: fig6 I am wondering how the condition view image is provided to the denoising process. Is it generated by DDIM inversion?

flymin commented 10 months ago

Yes, we use DDIM inversion.

Little-Podi commented 10 months ago

I see. Then, what is the diffusion model used to conduct the inversion? From my understanding, if the single-image diffusion model is used, it cannot faithfully reproduce the conditional view, as the final DDIM sampling is processed by the multi-view diffusion model. Am I right?

flymin commented 10 months ago

Sorry, I made a mistake in the last reply. We did not use DDIM inversion. We

  1. add the sampled noise to the given view according to the scheduler and use it as the input for noise prediction;
  2. replace the predicted noise with ground truth for the given view.

similar to the inpainting process.

Little-Podi commented 10 months ago

Sounds effective. Thanks for your detailed reply.