Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Apache License 2.0
324
stars
37
forks
source link
What are the effects of overfitting for downstream tasks? #38
Seems like the model itself is overfitting, but the performance of the trained model is not up to the mark even if I had used early stopping. I trained one for 3 epochs and the unadapted models perform better than the trained ones. And I was wondering if I could have some insights on why this is, I don't really know where to ask this question. If there is some other place where this question is suitable, please let me know and I will take it there, Especially because this is more of a theoretical question than something tied to this library.
I am relatively new to training models, so please let me know if I am making any obvious mistakes here (or if any other information is required).
I was trying to adapt the
sentence-transformers/multi-qa-mpnet-base-dot-v1
model to the financial domain using SEC data using GPL.I trained the model with the following hyperparams:
My loss curves were as follows:
Seems like the model itself is overfitting, but the performance of the trained model is not up to the mark even if I had used early stopping. I trained one for 3 epochs and the unadapted models perform better than the trained ones. And I was wondering if I could have some insights on why this is, I don't really know where to ask this question. If there is some other place where this question is suitable, please let me know and I will take it there, Especially because this is more of a theoretical question than something tied to this library.
I am relatively new to training models, so please let me know if I am making any obvious mistakes here (or if any other information is required).