Models get worse after fine-tuning with GPL

HenryL27 commented 1 year ago

Hey all, I'm trying to fine-tune MiniLM on the fiqa dataset using GPL. Basically I've pretty much followed the instructions in the GPL Readme. I've changed my learning rate to 2e-6 (from the default 2e-5). I've used a different base ckpt ("all-MiniLM-L6-v2" from SentenceTransformers), and I've also tried "multi-qa-distilbert-cos-v1". I've tried fine-tuning on trec-covid. I've thrown other cross-encoders at it for that step. I've tried removing the normalization layers from the models (so they train dot-similarity rather than cos). I've rewritten the GPL library myself.

Most of these tests were about 5k-15k training steps, although I did run one overnight test for the tutorial's 140k, to no avail - in any case, the evaluations are going in the wrong direction: more training => worse results.

I have a headache now. Y'all are probably all going to tell me I'm overfitting, but I find it strange that the tutorial example from the GPL repo would do that. Anyway, what can I do about overfitting my data (if that is what's going on)?

Thanks,

Henry

HenryL27 commented 1 year ago

Oh, here's the GPL repo I'm referring to: https://github.com/UKPLab/gpl

HenryL27 commented 1 year ago

I also wrote a MarginMSE Evaluator, and handed it a subset of the training data. Interestingly, it appears that the loss function values start increasing (!) after a couple hundred steps. So something is very very wrong, right?

lnatspacy commented 1 year ago

I have a very similar issue, did you find out what's wrong?

HenryL27 commented 1 year ago

nope. you?

UKPLab / sentence-transformers

Models get worse after fine-tuning with GPL #1891