Open okgrammer opened 5 years ago
Same question.
This project do not support training, do you know some other code for sentence embedding?
This project do not support training, do you know some other code for sentence embedding?
You could try to use BERT
It seems BERT can not output good sentence embedding. https://arxiv.org/abs/1904.07531v4
Bert out of the box does not yield good embeddings, that's true. But with some fine-tuning it can give you really nice embeddings.
See https://github.com/UKPLab/sentence-transformers
How to Fine-Tune BERT to give you good sentence embeddings.
Thank you very much!!
Ironically BERT is giving me significantly better results out of the box and with a night of fine tuning on a GTX 1060 it works even better.
Hello, The results you are mentioning, are these for English only or involve some (zero-shot) transfer to otjer languages ? If only English is needed, than there are indeed several other approaches that you may want to compare for your task. LASER focusses on multilingual sentence embeddings which work well for many languages without the need to fine-tune them.
There's https://github.com/transducens/LASERtrain (approximation, models not inter-compatible)
Hello, We are aware that there's a lot of interest in the training code. The original LASER training code was based on the version of fairseq which now dates back almost 1 year. We are working on a substantially improved version of LASER training which will use the current fairseq and scales much better to many languages. Please be patient :-)
There's https://github.com/transducens/LASERtrain (approximation, models not inter-compatible)
Have you be able to run their codes? I run into errors when I run their codes: indices, ignored = _filter_by_size_dynamic() AttributeError: 'function' object has no attribute 'size'
Hello, We are aware that there's a lot of interest in the training code. The original LASER training code was based on the version of fairseq which now dates back almost 1 year. We are working on a substantially improved version of LASER training which will use the current fairseq and scales much better to many languages. Please be patient :-)
Hello, how is this project going
any update on this? I am very interested in training my own models too.
any update on this? I am very interested in training my own models too.
I have used this codes for training. It can achieve similar performance.
Hi @sebastian-nehrdich
It is not the LASER training, but if your are open to other multilingual sentence embedding training methods that work for several tasks better than LASER: https://github.com/UKPLab/sentence-transformers/blob/master/docs/training/multilingual-models.md
Details can be found in this paper: Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation https://arxiv.org/abs/2004.09813
Is there any plan to release the code/scripts for training the encoder? I would like to train using my own data. Thanks!