Some question about the Equation 2

Wentao-Xu commented 3 years ago

Hi, thanks for sharing the code, I have some questions about Equation 2 in this paper.

In the original ComplEx paper of ICML 2016, the scoring function is Re(<h, r, \overline{t}>). While in Equation 2 of your paper, the scoring function becomes Re(\overline{h}rt)
In Equation 2, why Re(\overline{h}rt) = Re(<h\overline{r}, t>)? I think Re(<h\overline{r}, t>) is not equal to the ComplEx' function Re(<h, r, \overline{t}>).

Looking forward to your response.

zhanqiuzhang commented 3 years ago

Hi @Wentao-Xu ,

We use the formulation Re(\overline{h}Rt^\top) for notation convenience in the whole paper. Actually, we can implement ComplEx using both formulations. The key of ComplEx is the antisymmetric score functions instead of the place of conjugation. Since both the real and the imaginary parts of embeddings are learnable parameters, implementations with these two formulations will share the same performance.
It is the definition of dot products in complex spaces (see wikipedia). We unify the formulations of tensor factorization based KGC models using a dot product between two complex vectors, while the authors of ComplEx use a component-wise multilinear dot product. The two formulations are the same when the relational matrices are diagonal and complex.

Thanks

Wentao-Xu commented 3 years ago

Thanks for your response,

I have read your code, and your code uses the scoring function Re(<h, r, \overline{t}>), but equation 2 Re(\overline{h}Rt^\top) are not corresponding to your code although they share the same performance. (Re(h) + Im(h) i) (Re(r) + Im(r) i) (Re(t) - Im(h) i) and (Re(h) - Im(h) i) (Re(r) + Im(r) i) (Re(t) + Im(h) i) are different. Probably Re(hR\overline{t}^\top) is more accurate？
Thanks for pointing out the <u, v> represents the inner products of two complex vectors. But I still do not understand what the h\overline{r} mean in Re(<h\overline{r}, t>), do you mean the h\overline{r} is the dot product between h and \overline{r} ? or h\overline{r} = (Re(h) + Im(h) i) (Re(r) + Im(r) i) ？

Looking forward to your response again.

zhanqiuzhang commented 3 years ago

Hi,

You can just think that we parameterize the negative imaginary parts of entity embeddings. Then the score function will be the same as that in our code.
$h\overline{R}$ is the multiplication between a complex vector and a complex matrix, and the result is a complex vector. When $R$ is diagonal, it is equivalent to the element-wise product between $h$ and the $r$, when $r$ is a vector consisting of the diagonal elements of $R$.

Thanks

Wentao-Xu commented 3 years ago

Thanks for your detailed response, I probably understand what you mean. \overline{h} is [h_0, -h_1 i], R is [[r_0, 0],[0, r_1 i]], and t is \overline{h} is [t_0, t_1 i]. And do you mean <h\overline{R}, t> = <(h_0r_0 - h_1r_1) + (h_0r_1 + h_1r_0) i, t_0 + t_1 i>?

To be honest, it is really hard to understand, Maybe there should be more introduction in the paper (e.g., more details in Section 2 Preliminaries)? Your code is not directly corresponding to your paper (although they doing the same thing). In your paper, the matrix representation R_j of the relation r_j is a matrix, but the representation of relation r_j in your code is a vector.

zhanqiuzhang commented 3 years ago

Yes, <h\overline{R}, t> = <(h_0r_0 - h_1r_1) + (h_0r_1 + h_1r_0) i, t_0 + t_1 i>.
We unify the formulations of RESCAL/CP/ComplEx with relational matrices. We did not include the basic definitions of matrix operations due to the space limit of NeurIPS. Nonetheless, thanks for your suggestions. We will consider to add more introduction in the next version of our paper.

Wentao-Xu commented 3 years ago

Yes, I think h\overline{R} is not an ordinary multiplication if h is a complex vector and \overline{R} is a diagonal matrix.

But I still not understand why h\overline{R} = [h_0, h_1 i] [[r_0, 0],[0, -r_1 i]] = h_0r_0 - h_1r_1) + (h_0r_1 + h_1r_0) i? Could you provide more details about the multiplication you defined?

zhanqiuzhang commented 3 years ago

The matrix R is not just diagonal, but also complex. Each diagonal element of R is a complex number.
The operations in our paper are ordinary matrix multiplications in the complex space. Thus, if you let the embedding of h be h_0+h_1 i, then the embedding dimension is 1 in the complex space. Correspondingly, R should be a 1x1 matrix [[r_0+r_1 i]], instead of [[r_0, 0],[0, -r_1 i]].

Wentao-Xu commented 3 years ago

ok, thanks for your reply. h is h_0+h_1 i, and R is [[r_0+r_1 i]], the conjugate matrix \overline{R} is [[r_0 - r_1 i]], so: h \overline{R} = [h_0+h_1 i] [[r_0- r_1 i]] = h_0r_0 + h_1r_1 + (h_1r_0 - h_0r_1) i, but this is not the (h_0r_0 - h_1r_1) + (h_0r_1 + h_1r_0) i we want. Did I understand something wrong again?

zhanqiuzhang commented 3 years ago

Yes, h \overline{R} = [h_0+h_1 i] [[r_0- r_1 i]] = h_0r_0 + h_1r_1 + (h_1r_0 - h_0r_1) i. It leads to an equivalent formulation of ComplEx, as we what we have discussed before.

In our paper, the dot product between two complex vectors u and v are <u, v>=\overline{u} t^\top (see Equation 2). Thus, when taking the dot product between h\overline{R} and t, h\overline{R} actually works as h_0r_0 + h_1r_1 + (-h_1r_0 + h_0r_1) i.

I have mentioned that, you can just think that we parameterize the negative imaginary parts (-h_1) of entity embeddings. In this way, to implement h_0r_0 + h_1r_1 + (-h_1r_0 + h_0r_1) i, the code will be (h_0r_0 - h_1r_1) + (h_0r_1 + h_1r_0) i.

Wentao-Xu commented 3 years ago

ok, I totally understand. The notation h_1 in your paper is not corresponding to vector h_1 in your code, but corresponding to the vector (-h1) in your code. The other problem is if you parameterize the negative imaginary parts (-h_1) of entity embeddings, since the tail entity t shares the same embedding as head entity h, do you also parameterize the negative imaginary parts (-t_1) of entity embeddings?

Wentao-Xu commented 3 years ago

That is, given a 4000 dimension complex vector [e_0, e_1] of the embedding h or t. (h and t are the same entity (e.g., lion) but in different positions). The real embedding for the head entity h is [e_0, -e_1], but the tail entity's embedding t should also be the [e_0, -e_1] since h and t are the same entity.

zhanqiuzhang commented 3 years ago

Yes. That's why there is no conjunction for t as that in the original ComplEx paper.

Wentao-Xu commented 3 years ago

But why do you do this transformation? Why do you not make the notation in your paper correspond to the code? This transformation makes the paper harder to understand, and I can not understand if you do not provide such a detailed explanation. 哈哈哈，我真的被你绕晕了，paper里面都没有讲虚部的参数都取了个负号，搞到我在纸上推了好久都没推出你paper里公式的结果。

zhanqiuzhang commented 3 years ago

I have also mentioned that, we use the formulation Re(\overline{h}Rt^\top) for notation convenience in the whole paper : ). Moreover, the notations in our paper are self-consistent and equivalent to the implementations in our code.

2333，这种实现上的细节写在 paper 里会有更多人看不懂吧。

Wentao-Xu commented 3 years ago

All right, but my first impression of this paper is why Equation 2 is different from the scoring function of ComplEx in ICML 2016 or ICML 2018, and the reason is you parameterize the negative imaginary parts (-h_1) or (-t_1) of entity embedding.

In a word, I think more clarifications are definitely necessary.

MIRALab-USTC / KGE-DURA

Some question about the Equation 2 #1