Hzfinfdu / Diffusion-BERT

ACL'2023: DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
Apache License 2.0
286 stars 24 forks source link

GPT mentioend in Figure3 #1

Open jzhang38 opened 1 year ago

jzhang38 commented 1 year ago

Dear authors,

Thanks for open-sourcing your wonderful work.

You mention GPT in Figure 3 when comparing the Pareto front across different models("AR models of the same size"). May I ask if this is a pre-trained GPT (e.g. GPT2-small) finetuned on the LM1B dataset, or a model with GPT architecture trained from scrach on the LM1B training set?

Hzfinfdu commented 1 year ago

Hi,

Thank you for your question! We include both models in Figure 3. The red curve, which is rather close to our DiffusionBERT stands for an AR model trained from scratch and the green one for finetuned GPT2. In general, DiffusionBERT still falls behind pretrained AR models in terms of generation quality.

Best, Zhengfu