XiangLi1999 / Diffusion-LM

Diffusion-LM
Apache License 2.0
1.03k stars 134 forks source link

Paths to models/hf hub, extra transformer subclasses #7

Closed jwkirchenbauer closed 2 years ago

jwkirchenbauer commented 2 years ago

Hi there, great work on this paper!

I was just trying to run some of the code to understand your full pipeline and I was able to "Train Diffusion-LM" using the scripts/run_train.py under improved-diffusion. However, the next utility scripts/batch_decode.py has a series of paths like predictability/diff_models/... which I'm pretty sure were all local paths during dev, as the code is unable to pull them from huggingface model hub. The issue occurs after generation, when scripts/ppl_under_ar.py is launched. It can't get this model from the hub https://huggingface.co/predictability/diff_models/e2e-tgt_e=15_b=20_m=gpt2_wikitext-103-raw-v1_101_None/resolve/main/config.json

My understanding is that ppl_under_ar.py is supposed to use your "teacher LM (i.e., a carefully fine-tuned GPT-2 model)" to asses the generation quality of the diffusion model trained by scripts/run_train.py (my trained diffusion model is at Diffusion-LM/improved-diffusion/diffusion_models/diff_e2e-tgt_block_rand16_transformer_lr0.0001_0.0_2000_sqrt_Lsimple_h128_s2_d0.1_sd102_xstart_e2e)

I assume that you used your modified run_clm.py at Diffusion-LM/transformers/examples/pytorch/language-modeling to finetune a GPT-2 model for the task of measuring perplexity of the generated samples. Is this correct? And as such, one could use any ARLM, they just won't get the same perplexities/"lm-scores" you reported?

Thanks for helping clarify how the components work and what models are available for download (if any).

PS ... what are the various custom subclass models for? and why the "compression"? I can't really match this ideas or these models perfectly to the paper : {GPT2LMHeadModelCompress, BERTModelCompress, AutoEncoderWithNoise, AR_for_cont, GPT2VAE }. Is the compression/down projection your way of enabling the model to diffuse in a reduced dim space?

XiangLi1999 commented 2 years ago

Hi,

Thanks for reading the paper and code in depth!

right, those models that I used for evaluating lm-score are local, and not public on Huggingface. Your understanding is right, they are the teacher-lm.

Correct, you can use the run_clm.py to finetune your own AR model and evaluate perplexity of generated samples. You will likely get the same trend, but probably not the exact lm-score number.

I can look for the AR teacher-lm models and put them on a google drive [TODO], if needed.

The subclass are part of the preliminary experiment, not mentioned in the paper. I tried those as an alternative Embeddng function, as opposed to the final proposed "end-to-end training objective" in the paper. You can safely ignore these classes...

Best, Lisa

jwkirchenbauer commented 2 years ago

Awesome! I'll be on the lookout for those model checkpoints used for lm-score, in case you get a chance to push them up.