What is the expected number of epochs for training sentenceBERT

UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

https://www.SBERT.net

Apache License 2.0

14.73k stars 2.43k forks source link

What is the expected number of epochs for training sentenceBERT #120

Open MastafaF opened 4 years ago

MastafaF commented 4 years ago

Hi,

Given a model in {BERT, XLM, .XLnet, ...}, do you have a dictionary of estimated best number of epochs for training your Siamese Network on NLI dataset?

Else, what would be your suggestion on this? (other than just keep trying with different epochs parameters since it takes a lot of computational time 😞 )

That would be very useful for other users as well I think.

Cheers and great job! :D

nreimers commented 4 years ago

Hi @MastafaF The best number of epochs depends on the model and, even more important, on your dataset. For NLI data, it found 1 epoch often sufficient. For smaller datasets, like STSbenchmark, 3 or 4 epochs were good.

Usually I evaluate the model every 1000 steps on the dev set. If the performance improves, a check point is stored. Using this technique, you are not so depended on setting the epoch number right. You just select one number that is large enough (usually between 1 and 5) and it will store automatically the best model based on the score on the dev set.

Best Nils Reimers

MastafaF commented 4 years ago

Great, thanks for the insight @nreimers. I trained camemBERT following the method with your Siamese Network with additional datasets in French on (X)NLI. Would love to share that and contribute to this great project. Best model has been saved and can be shared.

Cheers,

nreimers commented 4 years ago

Hi @MastafaF If you send me the download link to the model (reimers@ukp.tu-darmstadt.de), I can evaluate it on some French data and add it to the repository here. Then people could easily download and use it.

Best Nils Reimers

MastafaF commented 4 years ago

Hi @nreimers ,

Sure! I can also do a pull request if that suits you better?

Would you mind sharing your own evaluation algorithm? Currently doing it on NLI data.

Cheers,

nreimers commented 4 years ago

Hi @MastafaF I was using an extension of the multilingual STS 2017 dataset: https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/multilingual-models.md

Currently I extend it to also do bitext-mining (e.g. find identical sentences in English and French).

As I would only need to upload the model to our FTP server, a link for a download would be better than a PR.

Best Nils

MastafaF commented 4 years ago

Great! Sent you a mail with link to the model in French. 😀

A simple evaluation could be on Similarity Search between original camemBERT, multilingual_sentence_transformers (your implementation of multilingual-models) and the model I just sent you by mail using your paradigm with camemBERT.

MastafaF commented 4 years ago

Hi @nreimers,

Any feedback on the evaluation of French sentence-BERT?

Akshayextreme commented 4 years ago

@MastafaF could you please share similarity_evaluation_results.csv file generated while fine-tuning sentence-transformers? I am also trying to fint-tune it, would like to compare my progress with your fine-tuning