Could you provide scripts to reproduce the results?

@XiangLi1999 Thanks for your great work!

I am trying to reproduce the results on wikitext but meet some problems.

I use your script:

python run_generation.py --model_name_or_path gpt2-xl --model_type gpt2 --length 256 --prompt_file wikitext --student_name_or_path gpt2 --st_coef 1.0   --student_temperature 0.5  --outfile outputs/temp_out.jsonl    --ignore_prefix no

And then evaluate the output file by:

python eval_script.py ./outputs/temp_out.jsonl

The output is

{'name': './outputs/temp_out.jsonl', 'rep-2': 9.5, 'rep-3': 1.87, 'rep-4': 0.4, 'diversity': 0.8845241939999999, 'mauve': 0.8812567264373257, 'coherence': 0.5913593170305366} (I disable other metrics)

which is different from reported results in the paper (coherence = 0.59 v.s. 0.69).

I find that ./outputs_ignorePrefix_ccnews_256/wikitext_results/wikitext_gpt2-1.0-t0.5_gpt2-xl_256.jsonl can produces correct metric values. May I ask two questions:

What is the generation script used to produce the correct outputs?
What does the values in wikitext_gpt2-1.0-t0.5_gpt2-xl_256.jsonl mean? For example, 256 seems output length, 0.5 is student temperature. What does 1.0 indicate?

XiangLi1999 / ContrastiveDecoding

Could you provide scripts to reproduce the results? #7