Open Muennighoff opened 2 years ago
cc @lintangsutawika @haileyschoelkopf - can't add you as reviewers somehow, but would be great if you could take a look. I'm not 100% sure about the results I got 🧐
Will take a closer look.
@Muennighoff so the intended results is suppose to be that with Prefix-LM the performance should be higher, right? However, based on the scores you shared, this does not seem to be the case.
Yeah so according to the current results evaluating the model as a causallm is better than a prefixlm after it was fine-tuned as a prefixlm. Also note:
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
paper. I.e. for
CD:FLM (219B) + CD:MTF (13B)
CD:FLM (219B) + ND:MTF (13B
This PR adapts evaluation to work with Prefix LMs, such as used for T0 finetuning experiments.
Using the normal eval harness I get the following results:
Using
CHECKPOINT_PATH=$six_ALL_CCFRSCRATCH/checkpoints/tr11f-6B3-ml/checkpoints/main/global_step163750
(CKPT prior to MTF):copa "acc": 0.58
Using
CHECKPOINT_PATH=/gpfsscratch/rech/six/commun/checkpoints/tr13f-6B3-ml-t0/checkpoints/prefix/global_step2000
:copa "acc": 0.7
Using
CHECKPOINT_PATH=/gpfsscratch/rech/six/commun/checkpoints/tr13f-6B3-ml-t0/checkpoints/prefix/global_step3100
:copa "acc": 0.67
Using
CHECKPOINT_PATH=/gpfsscratch/rech/six/commun/checkpoints/tr13f-6B3-ml-t0/checkpoints/prefix/global_step3100
without --prefix:copa "acc": 0.73