Closed loreloc closed 1 year ago
@loreloc Hi Lorenzo, you are right. We did augment the data set with reciprocal triples as empirically we found that it is better than using score_lhs
. Let me know if you have further questions.
Hi @yihong-chen, thank you for your answer.
So do you confirm that the loss stated in the paper (Eq. 2) is not exactly the one used in the experiments, but it is actually the one showed in Lacroix et. al (2018) (Eq. 7) with the addition of the relation prediction auxiliary?
Hi @loreloc We have two implementations in our codebase, with- and without- reciprocal triples. The --score_lhs
should be turned on if you are not using reciprocal triples. We also derive our objective (Eq.2) using this setting, as it is more clear to see the underlying idea of "perturbing every position". This view of "perturbing every position" is very similar to masked language modelling in NLP, if you treat each position (subject/predicate/object) as one token and mask it.
Our reported results are with reciprocal triples. So you are right, it is Lacroix et. al (2018) (Eq. 7) + the relation prediction auxiliary. In general, using reciprocal triples is a very useful trick as observed both in Dettmers et al., 2018 and Lacroix et. al (2018).
Let me know if there is anything else I can help.
Thank you! I think this can be closed.
Hi, I have noticed that in your experiments the flag
--score_lhs
is not enabled, and this flag includes the component $-\log P_\theta(s\mid p,o)$ into loss. In contrast, the 1vsAll objective includes this conditional likelihood, so it seems there is a discrepancy between the objective function in the paper (where there is a conditioning on the subjects) and the one used here.Is it because you augment the data set with reciprocal triples? If so, is this equivalent to assuming that $P\theta(S=s\mid R=p,O=o) = P\theta(O=s\mid R=p^{-1},S=o)$, where $r^{-1}$ denotes the inverse relation?
Thank you