Closed rbiswasfc closed 4 days ago
This looks good. Can you add a smoketest for
ablation_eval.py
like we have forglue.py
? Then it will be ready to merge.
Thanks for the suggestion, just added the smoketest!
LGTM, as raised, the results are still a bit low but this seems ok and this does not come from the PR as it was already the case with my prior tests.
This PR adds script and configs to run ablation evaluations with 4 datasets (MNLI, WiC, BoolQ, EurLEX). BoolQ and WiC fine-tuning uses checkpoint obtained from the MNLI training run.
Eval for flex bert can be run using
python ablation_eval.py yamls/ablations/flex-bert-ablation-eval.yaml
, which gives following scores: