performance numbers on the model overview

Hello!

First of all, "Performance Sentence Embeddings" refers to the performance of the model on various different tasks. Admittedly, I don't know which ones were used, but I suspect that it includes some classification, clustering, semantic search, etc. "Performance Semantic Search" refers to only benchmarks for semantic search, i.e. given a question or search query, how well can the model find relevant text passages through embedding similarity.

Secondarily, the reported scores are most likely Spearman rank correlation based on cosine similarity, at least for the Semantic Search performance. In a nutshell, it measures the similarity of a pair of embeddings against the gold standard similarity score for those sentences. The correlation score is higher if there is a higher correlation between the similarity score and the gold standard score, maxing out at a score of 1 (or 100) if the two values are monotonically related. It kind of measures how well a higher predicted label indeed corresponds with a higher gold label, giving some confidence that the similarity scores of the model are correct/useful.

For the "Performance Sentence Embeddings", I think it's very possible that the score is just an average of various different scores, even if they have different kinds of measurements. For example, just the average between an accuracy on a classification task, Spearman correlation for a semantic search task, Validity Measure for a clustering task, Normalized Discounted Cumulative Gain @ k for Retrieval, etc.

Tom Aarsen

UKPLab / sentence-transformers

performance numbers on the model overview #2475