Why not directly use Emb(W) as X_0?

XiangLi1999 / Diffusion-LM

Diffusion-LM

Apache License 2.0

1.05k stars 135 forks source link

Why not directly use Emb(W) as X_0? #56

Open leekum2018 opened 1 year ago

leekum2018 commented 1 year ago

Thanks for your nice work. I have a question and have difficulty understanding it, that is, why not directly use $Emb(W)$ as $X_0$, instead, $X_0 = Emb(W)+ N(0, \sigma_0 I)$ in the paper. Looking forward to your reply, thanks!

Dawn-LX commented 1 year ago

+1, I also have this question

mimbres commented 1 year ago

FYI, this was discussed in openreivew. $\sigma_0$ is set to 0.0001 and it becomes spiky Gaussian, and it was empirical choice according to the authors.