cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
https://gaoruiyuan.com/magicdrive/
GNU Affero General Public License v3.0
419 stars 22 forks source link

The inputs of view-conditioned generation. #4

Closed Little-Podi closed 5 months ago

Little-Podi commented 5 months ago

Hi, congrats for your excellent work! I have a question regarding view-conditioned generation in Fig. 6: fig6 I am wondering how the condition view image is provided to the denoising process. Is it generated by DDIM inversion?

flymin commented 5 months ago

Yes, we use DDIM inversion.

Little-Podi commented 5 months ago

I see. Then, what is the diffusion model used to conduct the inversion? From my understanding, if the single-image diffusion model is used, it cannot faithfully reproduce the conditional view, as the final DDIM sampling is processed by the multi-view diffusion model. Am I right?

flymin commented 5 months ago

Sorry, I made a mistake in the last reply. We did not use DDIM inversion. We

  1. add the sampled noise to the given view according to the scheduler and use it as the input for noise prediction;
  2. replace the predicted noise with ground truth for the given view.

similar to the inpainting process.

Little-Podi commented 5 months ago

Sounds effective. Thanks for your detailed reply.