XiangLi1999 / Diffusion-LM

Diffusion-LM
Apache License 2.0
1.03k stars 134 forks source link

Strength of classifiers vs. results: do all baselines in the paper use the same classifier as Diffusion-LM? #22

Open jpilaul opened 2 years ago

jpilaul commented 2 years ago

Thank you so much for making your code public. Again, very interesting work! This is not really an issue but more of a question.

I have a feeling that the strength of the classifier really matters in plug-and-play controlled text generation. I was wondering if the models that you compare against all use the same classifiers. If not, can you comment on differences (number of parameters, pretrained or not, architectures, etc...)?

Thank you.

XiangLi1999 commented 2 years ago

I think the strength of classifiers would matter, the expressiveness of the classifier is one aspect, I think the more important bottleneck is the signal passing interface between the classifier and the LM. Since the AR LM is generating left-to-right, the classifier can only learn to take in partial inputs, which would hurt their performance compare to a classifier that takes in full sequence.

To answer the question: I am not using the same classifier. All the classifiers are trained from scratch. I think the classifiers used for diffusion-LM is 80M transformer models. for semantic content control, we use a left-to-right model, for all the other , we use a bi-directional one.