UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.98k stars 2.44k forks source link

Fine-tuning multilingual bert Cross Encoder #2354

Open KeshavSingh29 opened 11 months ago

KeshavSingh29 commented 11 months ago

Firstly, thanks for providing such an amazing library.

Background: I have seen a lot of interest in multilingual cross encoders and although amberoad model trained on msmacro data is available, it does not perform well on domain-specific data.

Problem: I tried finetuning multilingual bert model in a cross encoder setup with my own dataset. It worked great for the first time but if I load a new model again and perform fine-tuning then the performance drops down. I'm not fine-tuning the same model again but rather fine-tuning another instance of multilingual bert.

What am I missing here? Is there some way I can make sure the fine-tuning model results remain consistent ?

tomaarsen commented 11 months ago

Hello!

Please let me know if I'm understanding you correctly. You did the following: 1) Finetune multilingual BERT X on your dataset Y -> worked well. 2) Finetune multilingual BERT Z on your dataset Y -> poor performance.

By "another instance of multilingual bert", do you mean another base model, e.g. https://huggingface.co/bert-base-multilingual-cased for the first try and https://huggingface.co/bert-base-multilingual-uncased for the second try, or do you mean another fresh instantiation of the same model? If you mean the former, then it could be that the second BERT model is not as good as the first, or that it doesn't align with your domain specific data as well. If it is the latter, then the two training setups should be identical right? Then we wouldn't expect any differences other than variation from randomness.

KeshavSingh29 commented 11 months ago

@tomaarsen Thanks for your quick response. Sorry for my unclear explanation. I mean fine-tuning two fresh instantiation of the same model (The model I'm using is multilingual-bert-uncased ) My training samples are the same, except that In the DataLoader I set shuffle=True.

tomaarsen commented 11 months ago

Very interesting. And the performance difference is notable? Running the same training setup multiple times should only result in slight variations, exactly because of reasons like shuffle=True. To try and narrow down if it's indeed caused by randomness, you can use set_seed imported from transformers. It should prevent most randomness from being identical between runs.

KeshavSingh29 commented 11 months ago

@tomaarsen Thanks a lot for the advice. Using set_seed fixed the variation in accuracy for different fine-tuned model instantiation.

However, now I'm interested to understand how can I get the most out of fine-tuning? For instance, when not using set_seed, I could get accuracy varying from 80% - 90%. for different FT models. Would be great to get 90% accuracy or even more if thats possible.

wilfoderek commented 8 months ago

n the former, then it could be that the second BERT model is not as good as the first, or that it doesn't align with your domain specific data as well. If it is the latter,

Can you share your training process?