Open PhilipMay opened 1 year ago
This is also connected to something strange I observed at SetFit here: https://github.com/huggingface/setfit/issues/135#issuecomment-1297000383
Maybe @nreimers could comment this? :-)
I think it is depended on loss function. Example, if you use Triplet-loss with distance-metric, Manhattan or Euclid will be better than cosine
Hi, I am using
MultipleNegativesRankingLoss
to train a German Bert model (deepset/gbert-base) on German sentence pairs.During the training I am evaluating on German stsb data. My observation is this:
It seems like the Manhattan-Distance and Euclidean-Distance is a better distance metric than Cosine-Similarity.
For me, this result is really strange. Doesn't this indirectly mean that it would be better to use Manhattan-Distance or Euclidean-Distance for the loss function as well - during training? Only problem: The distance is not normalized between 0 and 1 like the Cosine-Similarity.
Is there a solution & explanation for this?