Strength of classifiers vs. results: do all baselines in the paper use the same classifier as Diffusion-LM?

XiangLi1999 / Diffusion-LM

Diffusion-LM

Apache License 2.0

1.06k stars 136 forks source link

I think the strength of classifiers would matter, the expressiveness of the classifier is one aspect, I think the more important bottleneck is the signal passing interface between the classifier and the LM. Since the AR LM is generating left-to-right, the classifier can only learn to take in partial inputs, which would hurt their performance compare to a classifier that takes in full sequence.

To answer the question: I am not using the same classifier. All the classifiers are trained from scratch. I think the classifiers used for diffusion-LM is 80M transformer models. for semantic content control, we use a left-to-right model, for all the other , we use a bi-directional one.

XiangLi1999 / Diffusion-LM

Strength of classifiers vs. results: do all baselines in the paper use the same classifier as Diffusion-LM? #22