Open ankitkr3 opened 3 years ago
Hi @ankitkr3 This is not how it works.
Taking the pure score is not meaningful, you usually have to compare, i.e. cossim(A, B) vs. cossime(A, C).
Further, if all embeddings are in the positive space of the vector space, you would expect a cossim score of 0.5 for two random points
@nreimers from where should i take cossim(A,C), when I only have two sentences? Please explain it a bit.
As mentioned the scores by itself are not meaningful, you cannot say if 30% is a high or low value. It only makes sense when you compare it with other examples.
Hi guys, thanks for your continuous support and work
I am trying to find semantic similarity using Roberta large model but I am getting very high score unnecessarily. For example :
ideal text : The early explorers and traders shaped our history by changing the way indians lived and by learning about new land for the U.S. The traders shaped our history by changing indians traditions. For example the indians use to use every part of a buffalo. Then they started to kill buffalo only for their pelts so they could trade them with the traders. The explorers shaped our history by discovering Pikes Peak. If Pike never climbed pikes peak it probably wouldn't be named that. In conclusion, traders and explorers shaped our history.
compared text : History
Score generated : 30% using cosine similarity.
Expected Score : 0-5 %