Open KeshavSingh29 opened 11 months ago
Hello!
Please let me know if I'm understanding you correctly. You did the following: 1) Finetune multilingual BERT X on your dataset Y -> worked well. 2) Finetune multilingual BERT Z on your dataset Y -> poor performance.
By "another instance of multilingual bert", do you mean another base model, e.g. https://huggingface.co/bert-base-multilingual-cased for the first try and https://huggingface.co/bert-base-multilingual-uncased for the second try, or do you mean another fresh instantiation of the same model? If you mean the former, then it could be that the second BERT model is not as good as the first, or that it doesn't align with your domain specific data as well. If it is the latter, then the two training setups should be identical right? Then we wouldn't expect any differences other than variation from randomness.
@tomaarsen Thanks for your quick response.
Sorry for my unclear explanation. I mean fine-tuning two fresh instantiation of the same model (The model I'm using is multilingual-bert-uncased )
My training samples are the same, except that In the DataLoader I set shuffle=True
.
Very interesting. And the performance difference is notable? Running the same training setup multiple times should only result in slight variations, exactly because of reasons like shuffle=True
.
To try and narrow down if it's indeed caused by randomness, you can use set_seed
imported from transformers
. It should prevent most randomness from being identical between runs.
@tomaarsen Thanks a lot for the advice.
Using set_seed
fixed the variation in accuracy for different fine-tuned model instantiation.
However, now I'm interested to understand how can I get the most out of fine-tuning?
For instance, when not using set_seed
, I could get accuracy varying from 80% - 90%. for different FT models.
Would be great to get 90% accuracy or even more if thats possible.
n the former, then it could be that the second BERT model is not as good as the first, or that it doesn't align with your domain specific data as well. If it is the latter,
Can you share your training process?
Firstly, thanks for providing such an amazing library.
Background: I have seen a lot of interest in multilingual cross encoders and although amberoad model trained on msmacro data is available, it does not perform well on domain-specific data.
Problem: I tried finetuning multilingual bert model in a cross encoder setup with my own dataset. It worked great for the first time but if I load a new model again and perform fine-tuning then the performance drops down. I'm not fine-tuning the same model again but rather fine-tuning another instance of multilingual bert.
What am I missing here? Is there some way I can make sure the fine-tuning model results remain consistent ?