Open greeneggsandyaml opened 1 day ago
The parameters seem fine. Is 'total_max_len' derived from the query? How is the final model performing?
I set the target modules as q_proj
, k_proj
, v_proj
, o_proj
, down_proj
, up_proj
, and gate_proj
, but this may not make a big difference.
Hello authors, thanks for your quick responses on my previous issues!
I'm making a new issue to ask whether these are the right hyperparameters for training the
bge-en-icl
. I'm finding that I can achieve pretty good performance, but not as good as the officialbge-en-icl
checkpoint.I'm using the following hyperparameters (here they are in a yaml file):
Do you know what might be the cause of the discrepancy between the performance of my trained model and your pretrained model? I'm happy to provide any more details if you feel that they would be helpful.
(I know I'm using Mistral v0.3 rather than v0.1; I don't think this makes much of a difference although perhaps you know otherwise)
Thanks so much! greeneggsandyaml