Closed cabhijith closed 4 years ago
Hi @cabhijith Yes, please report back your results.
I think it will not work that well. In order to be used in unsupervised fashions, the dimensions must fullfill certain properties. All dimensions count equally and have the same weight when you compute e.g. cosine similarity. Here, BERT sadly does not produce well working results.
What could be interesting for you would be unsupervised sentence embeddings methods, like Sent2Vec: https://github.com/epfml/sent2vec
They train the sentence embeddings just on raw text. If you have a really specific domain, this might result in better quality sentence embeddings.
Will check out that. Also, maybe try the approach of fine-tuning BERT on our dataset and then feeding it into SBERT, like in here - training_nli.py. Maybe that could work?
Yes, it might help.
Thanks!
Hi, Have been using sentence transformers for quite some time and love it! Works really well for standard English tasks. Now, wanted to use it for something much more domain-specific. Using sentence-transformers did not yield any good results. Unfortunately, we do not have the resources to label sentences for SBERT.
Do you think that the average embedding of a BERT model tuned on our data will outperform SBERT for semantic similarity?
Will report back with whatever results I get. Just wanted to open a discussion.