triviaqa results not reproducible

songwang41 commented 4 years ago

python3 -m scripts.triviaqa  --train_dataset $QA_PATH/processed/squad-wikipedia-train-4096.json      \
--dev_dataset $QA_PATH/processed/squad-wikipedia-dev-4096.json      \
--gpus 0  --num_workers 4 \
--max_seq_len 4096 --doc_stride -1  \
--save_prefix triviaqa-longformer-large  \
--model_path models/longformer-large-4096  \
--test

during evaluation, I only got this score {'exact_match': 0.025021894157387713, 'f1': 4.5473948151449575, 'common': 7993, 'denominator': 7993, 'pred_len': 7993, 'gold_len': 7993}

ibeltagy commented 3 years ago

Are you sure models/longformer-large-4096 is the triviaqa pretrained checkpoint or the vanilla longformer?

antoniogois commented 3 years ago

I'm running into the same issue, did anyone find what was wrong?

according to cheatsheet.txt:

    --save_prefix triviaqa-longformer-large  \  # pretrained pytorch-lighting checkpoint
    --model_path path/to/pretrained/longformer-large-4096  \  # loaded but not used

but from @ibeltagy 's comment, it seems like the checkpoint should be in --model_path, which one is correct?

--model_path expects a path to a directory containing a config.json file, so I can't provide the downloaded checkpoint to that flag. But if I provide it to --save_prefix, following the cheatsheet's instructions, I get very low results, similar to @songwanguw

antoniogois commented 3 years ago

ok it seems like --save_prefix is ignored, not --model_path. now I tried grabbing the downloaded triviaqa-longformer-large/checkpoints/_ckpt_epoch_4_v2.ckpt, renaming it to pytorch_model.bin and placing it in the folder of --model_path (which overwrites the vanilla model that I had there).

However I still get very low values. Any ideas of what to try?

antoniogois commented 3 years ago

solved. Indeed, those very low results come from a model that wasn't fine-tuned for triviaqa. To properly load the provided checkpoint, follow cheatsheet.txt with these exceptions:

--saved_prefix choose-a-name-for-output-dir
--model_path path/to/pretrained/longformer-large-4096 # path to folder of downloaded model pretrained with Masked LM, creating your own roberta-large-4096 following "convert_model_to_long.ipynb" will not work here
--resume_ckpt path/to/triviaqa-longformer-large/checkpoints/fixed_ckpt_epoch_4_v2.ckpt  # path to downloaded model finetuned for triviaqa

however, fixed_ckpt_epoch_4_v2.ckpt will fail to load. To fix, use a python console to load the file (with torch.load()), apply these changes and save them (with torch.save()):

checkpoint["state_dict"]["model.embeddings.position_ids"] = torch.arange(4098).to('cuda').unsqueeze(0)
checkpoint["checkpoint_callback_best_model_path"]=""  # some versions of pytorch lightning may not need this

I'll create a pull request adding these comments to cheatsheet.txt

allenai / longformer

triviaqa results not reproducible #134