LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
MIT License
766 stars 40 forks source link

Is diffusion position embedding necessary? #9

Open chrisway613 opened 1 month ago

chrisway613 commented 1 month ago

Thanks for your excellent work! I've seen that before the conditioning vector z entering diffusion model, positional embedding added to it, as the code below: https://github.com/LTH14/mar/blob/e0cccf8341aa3276069a5bf2eb4bcb83bebafa4e/models/mar.py#L229 Is this crucial? any explanation can be show? I'm really appreciate that.

LTH14 commented 1 month ago

Thanks for your interest! Actually this position embedding is not a crucial part 😂. I once thought we need to tell the DiffLoss which position it is generating, so I added this position embedding. But later I found that the condition z should already contain the position information because of the position embedding added at the beginning of the decoder. However, since all of our pre-trained models are trained with self.diffusion_pos_embed_learned, I just keep it in the code.

chrisway613 commented 1 month ago

I found that the condition z should already contain the position information because of the position embedding added at the beginning of the decoder.

That's exactly what I thought! Thanks for your reply.