What are the effects of overfitting for downstream tasks?

I was trying to adapt the sentence-transformers/multi-qa-mpnet-base-dot-v1 model to the financial domain using SEC data using GPL.

I trained the model with the following hyperparams:

{
"learning_rate": 0.00002,
"num_examples_eval": 1000, 
"num_examples_train" : 20000,
"num_epochs": 15
}

My loss curves were as follows: Train Loss Curve Validation Loss Curve

Seems like the model itself is overfitting, but the performance of the trained model is not up to the mark even if I had used early stopping. I trained one for 3 epochs and the unadapted models perform better than the trained ones. And I was wondering if I could have some insights on why this is, I don't really know where to ask this question. If there is some other place where this question is suitable, please let me know and I will take it there, Especially because this is more of a theoretical question than something tied to this library.

I am relatively new to training models, so please let me know if I am making any obvious mistakes here (or if any other information is required).

UKPLab / gpl

What are the effects of overfitting for downstream tasks? #38