SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
397 stars 30 forks source link

A little doubt about the paper #55

Closed Night-Quiet closed 3 months ago

Night-Quiet commented 4 months ago

26e3cf3d01db433d58e90a5ebb402f8 From a code perspective, the paper concludes by adding up all values of the complex loss function bd67d34543e5c8bf0fe30ebd95e4d1d This is a normal complex division formula and transformation. The purpose of the paper is to obtain the content of the red box. But you ultimately add up, as shown in the following figure: 0657161fd4f4f17275191040b84bf55 Is this the desired result of the paper? May I ask if you can tell me?, thank you.

SeanLee97 commented 3 months ago

hi @Night-Quiet, sorry for the delayed reply.

Thank you for providing the clear formula derivation in polar coordinates. It appears to be correct.

To accumulate the angle differences, we sum them up. In polar coordinates, the result indeed $\sqrt{2}\sin(\Delta)$, where $\Delta = \theta_i - \theta_j + \frac{\pi}{4}$. But for polar coordinates, supposed $\sin(\Delta)=x$, the desired result should be $\Delta = arcsin(x)$. It is hard to implement this in code. Thus, in this paper, we use an approximate calculation method for the angle difference, as demonstrated in the paper, taking practical considerations into account. Following this approach, the operation sum(y_pred) serves as a pooling operation, which can be a mean or other types of pooling operation. This pooling step is necessary to compute the final loss.

The reason for computing the normalized angle difference is to create a more intuitive similarity measurement than cos. In this context, a smaller angle difference indicates greater similarity.

Night-Quiet commented 3 months ago

Thank you for your reply. I think I understand what you mean.