UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.38k stars 2.39k forks source link

Explainability of sbert for sentence similarity #1805

Open amitkayal opened 1 year ago

amitkayal commented 1 year ago

Hello,

I am trying to implement Transformers Interpret for sbert and not able to visualize the output to understand how sbert sentence similarity compares multiple sentences and the keywords which contributed to compare this.

Do we have any documentation available for this? Thanks

bm777 commented 1 year ago

I would rather say that the similarity function of SBERT is calculating the distance between vectors of d-dimension. Before calculating the distance, SBERT converts the query and the passages into vectors. These vectors of real values are deduced from the {query, passages} through encoders (biencoders: whatever we called it). For some models, dimension d = 768. Then the vectors with 768 as dimensions.

The similarity is deduced by just ranking the distances calculated, and then retrieving top-k passage related to the query based on semantic search. Your question is based on keyword matching, which is a lexical search, and for real use cases, it has some weaknesses like not considering the synonyms and antonyms. I don't know why you are using keyword matching, but I would rather try to understand semantic search, instead of lexical search.