Open MastafaF opened 4 years ago
Hi @MastafaF The best number of epochs depends on the model and, even more important, on your dataset. For NLI data, it found 1 epoch often sufficient. For smaller datasets, like STSbenchmark, 3 or 4 epochs were good.
Usually I evaluate the model every 1000 steps on the dev set. If the performance improves, a check point is stored. Using this technique, you are not so depended on setting the epoch number right. You just select one number that is large enough (usually between 1 and 5) and it will store automatically the best model based on the score on the dev set.
Best Nils Reimers
Great, thanks for the insight @nreimers. I trained camemBERT following the method with your Siamese Network with additional datasets in French on (X)NLI. Would love to share that and contribute to this great project. Best model has been saved and can be shared.
Cheers,
Hi @MastafaF If you send me the download link to the model (reimers@ukp.tu-darmstadt.de), I can evaluate it on some French data and add it to the repository here. Then people could easily download and use it.
Best Nils Reimers
Hi @nreimers ,
Sure! I can also do a pull request if that suits you better?
Would you mind sharing your own evaluation algorithm? Currently doing it on NLI data.
Cheers,
Hi @MastafaF I was using an extension of the multilingual STS 2017 dataset: https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/multilingual-models.md
Currently I extend it to also do bitext-mining (e.g. find identical sentences in English and French).
As I would only need to upload the model to our FTP server, a link for a download would be better than a PR.
Best Nils
Great! Sent you a mail with link to the model in French. 😀
A simple evaluation could be on Similarity Search between original camemBERT, multilingual_sentence_transformers (your implementation of multilingual-models) and the model I just sent you by mail using your paradigm with camemBERT.
Hi @nreimers,
Any feedback on the evaluation of French sentence-BERT?
@MastafaF could you please share similarity_evaluation_results.csv
file generated while fine-tuning sentence-transformers? I am also trying to fint-tune it, would like to compare my progress with your fine-tuning
Hi,
Given a model in {BERT, XLM, .XLnet, ...}, do you have a dictionary of estimated best number of epochs for training your Siamese Network on NLI dataset?
Else, what would be your suggestion on this? (other than just keep trying with different epochs parameters since it takes a lot of computational time 😞 )
That would be very useful for other users as well I think.
Cheers and great job! :D