XiangLi1999 / Diffusion-LM

Diffusion-LM
Apache License 2.0
1.03k stars 134 forks source link

Time embeddings #4

Open tuttyfrutyee opened 2 years ago

tuttyfrutyee commented 2 years ago

Hi,

The paper, Appendix F, 4'th bullet point states that time embeddings are incorporated by softmax applied on learnable scaling and offsetting operations. I could not find the part of the code that does this. My first impressions are that time embedding is achieved at 902'th line of the "transformer_model2.py" file, in which the learnable time_embeddings are just linearly added to the input features. Am I missing something? Thank you...

XiangLi1999 commented 2 years ago

Hi, Thanks for the question! The model description in the appendix refers to a different model that runs diffusion on the simplex, the main model which runs diffusion on the embedding space just add the time_embeddings directly to input features.

XiangLi1999 commented 2 years ago

I will release the simplex model & code too in another branch in the future!

tuttyfrutyee commented 2 years ago

I see, thanks for the answer.

Did you guys try to inject the time_embeddings to the different layers of the transformer network, similar to what is done in the vision area for diffusion models?