cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
https://gaoruiyuan.com/magicdrive/
GNU Affero General Public License v3.0
664 stars 40 forks source link

About conditional embedding #76

Closed qiuzidian closed 3 months ago

qiuzidian commented 3 months ago

Hi, great job! I would like to know that, you embedded the camera and box conditions to cross attention together with text embedding, but didn't finetune the cross attention, why? Will input that is different from the original text embedding cause any problems?

flymin commented 3 months ago

Hi, the cross-attention is not trainable because we want to

  1. train as few parameters as possible;
  2. keep the generalization effect of text embeddings.

As shown by the results, 1 works well but 2 failed. You can try different texts by keeping other conditions the same. I did not see much difference.