Open nithya-AK opened 3 years ago
Hi @nithya-AK My Ph.D. student is currently working on this and we will integrate code and publish a paper soon.
Sadly it is not straightforward and just running MLM is not sufficient.
Also, do you have a symmetric or an asymmetric semantic search use case? https://www.sbert.net/examples/applications/semantic-search/README.html#symmetric-vs-asymmetric-semantic-search
Oh..Okay..Looking forward to it :) I have a symmetric semantic search use case as of now. Thanks a lot.
Hello again @nreimers, a quick query..doesn't this address the same use case ? Would that be useful in my scenario ?
@nithya-AK Just doing MLM does not yield any good sentence / text embeddings. In fact, they are worse than more basic approaches like average Glove embeddings.
What you need is either: 1) Training data (i.e. data with some type of labels) 2) Need a different pre-training objective. There is DeCLUTR https://arxiv.org/abs/2006.03659 and Constrative Tension https://openreview.net/forum?id=Ov_sMNau-PF
Soon these methods will be integrated in SBERT. Further, my student developed a better approach based on denoising decoders that beats previous approaches. This will also be integrated soon in SBERT.
Okay. Thanks again!
Hey! First of all, thank you for the awesome work you are doing. Would be grateful if you can help me out with the following situation: I have an unlabelled dataset which is domain specific and I want to do a semantic search. I have followed Huggingface tutorial and fine-tuned RoBERTa model on my data with MLM. Now, how can I use this fine tuned model along with RoBERTa-base tokenizer on sentence transformers to generate sentence embeddings and later do a semantic search ? I did try examples provided but i am not sure i follow it correctly.
Thanks in advance :)