XiangLi1999 / Diffusion-LM

Diffusion-LM
Apache License 2.0
1.03k stars 134 forks source link

Necessity of using diffusion model #17

Open ChorlingLau opened 2 years ago

ChorlingLau commented 2 years ago

Hi! If possible, can you tell the reasons why your team choose diffusion model to complete the tasks of controllable text generation?

XiangLi1999 commented 2 years ago

Hi,

Thanks for the question. My primary motivation is to figure out a system that can flexibly do plug-and-play controls for arbitrary control tasks specified via classifier. We first tried with autoregressive LMs with continuous relaxation, but they don't work on any complex/fine-grained controls, not even phrase infilling. Rounding error is the primary bottleneck.

So, we switch to diffusion based model. The intuition is that diffusion is trained to do rounding across diffusion steps (my intuition is that better rounding is related to nicely denoise).

Randolph-zeng commented 2 years ago

Hi,

Thanks for the question. My primary motivation is to figure out a system that can flexibly do plug-and-play controls for arbitrary control tasks specified via classifier. We first tried with autoregressive LMs with continuous relaxation, but they don't work on any complex/fine-grained controls, not even phrase infilling. Rounding error is the primary bottleneck.

So, we switch to diffusion based model. The intuition is that diffusion is trained to do rounding across diffusion steps (my intuition is that better rounding is related to nicely denoise).

hi @XiangLi1999 ,sorry to jump in this interesting discussion. I am curious on your comparison between autoregressive LMs and diffusion models and your conclusion that "Rounding error is the primary bottleneck". Do you mind kindly elaborating why GPT2 style models are bad in fine-grained controls due to the rounding error ? Thanks a lot!