UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.12k stars 2.46k forks source link

Abbreviation and Synonym #1397

Open Threepointone4 opened 2 years ago

Threepointone4 commented 2 years ago

What : How to support domain specific Abbreviation and Synonym when using a sentence transformer embedding model ?

Fine tuning would be one method, If yes then Can anyone tell what will be the best approach to go and if any other quick work around is there, that would really helpful.

nreimers commented 2 years ago

We are currently working on this, but it is in early research stage so I would expect to take 3-6 months before we have a good solution here.

Until then the best way would to have structured data on your domain like Question-Answer pairs.

Otherwise you can have a look at: https://arxiv.org/abs/2112.07577

For unsupervised domain adaptation.

When you have a list of synonyms, you can create text pairs my manually replacing the words and training on itself, i.e. you have pairs like: ("This is my example", "This is my expl")

If expl is an abbreviation for example

Threepointone4 commented 2 years ago

Understood. Thanks will try this also and let you know. One more option which i was trying converting everything into full form both in indexing and query time. Eg: what is IOT? -> what is internet of things? this will happen both in indexing and query time.

Bogdan1001 commented 1 year ago

Hey, do you've any updates about this? Is the solution been delivered?