Sujit-O / pykg2vec

Python library for knowledge graph embedding and representation learning.
MIT License
602 stars 109 forks source link

Problems achieving TransE's original paper results on FB15K #189

Open Rodrigo-A-Pereira opened 3 years ago

Rodrigo-A-Pereira commented 3 years ago

Hi, I am having some trouble reproducing the same results the original author of TransE when it comes to the FB15K dataset. The hyperparameters I am using at the moment are the default TransE.yaml that are the same the original author recomends in the paper (except the batch size since they do not specify what they use in the paper):

Original Paper: http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf

Yaml File Hyperparaemters:

model_name: "TransE"
dataset: "freebase15k"
parameters:
 learning_rate: 0.01
  l1_flag: True
  hidden_size: 50
  batch_size: 128
  epochs: 1000
  margin: 1.00
  optimizer: "sgd"
  sampling: "uniform"
  neg_rate: 1

However the results that I am getting are higher (reaching a difference >10% in the case of filtered hit@10) than the original paper:

Original paper results:

Vs.

Pykg2vec TransE results:

To run the model im using the following: python train.py -mn TransE -ds freebase15k -device "cuda"

Can somebody tell if I am doing something wrong in terms of calling the script, or hyperparameters choice. Or if not, an hypothesis in why such a difference exists?

Best regards,

Rodrigo Pereira

baxtree commented 3 years ago

Hi, @Rodrigo-A-Pereira ,

Thanks for reporting that. I ran 500 epochs with the default hyperparamters and got a similar result for Hits@10 while the filtered MR is much higher than yours:

------Test Results for freebase15k: Epoch: 500 --- time: 53.90------------
--# of entities, # of relations: 14951, 1345
--mr,  filtered mr             : 235.6700, 140.6000
--mrr, filtered mrr            : 0.2410, 0.3775
--hits1                        : 0.1335 
--filtered hits1               : 0.2410 
--hits3                        : 0.2695 
--filtered hits3               : 0.4520 
--hits5                        : 0.3545 
--filtered hits5               : 0.5370 
--hits10                        : 0.4720 
--filtered hits10               : 0.6310
---------------------------------------------------------
Rodrigo-A-Pereira commented 3 years ago

Thanks for replying @baxtree,

Your filtered filtered MR is indeed much closer to the original paper's results.

After exploring this a bit more into it I came to the conclusion that the problem is probably derived by the dataset instead of the implementation of the algorithm, since when training the second dataset the TransE authors report on their paper (WN18), with the exact same parameters they use on that paper, I obtain very similar results to the reported ones:

Original TransE paper for the WN18 dataset:

Results and hyperparameters used with pykg2vec on WN18:

model_name: "TransE"
dataset: "wn18"
parameters:
  learning_rate: 0.01
  l1_flag: True
  hidden_size: 20
  batch_size: 128
  epochs: 1000
  margin: 2.00
  optimizer: "sgd"
  sampling: "uniform"
  neg_rate: 1
2020-10-28 02:08:45,254 - pykg2vec.utils.evaluator - INFO - Full-Testing on [5000/5000] Triples in the test set.
100% 5000/5000 [00:45<00:00, 109.95it/s]
2020-10-28 02:09:30,733 - pykg2vec.utils.evaluator - INFO - 
------Test Results for wn18: Epoch: 999 --- time: 45.48------------
--# of entities, # of relations: 40943, 18
--mr,  filtered mr             : 339.0680, 326.6000
--mrr, filtered mrr            : 0.3363, 0.4451
--hits1                        : 0.0989 
--filtered hits1               : 0.1532 
--hits3                        : 0.5072 
--filtered hits3               : 0.7029 
--hits5                        : 0.6330 
--filtered hits5               : 0.8107 
--hits10                        : 0.7584 
--filtered hits10               : 0.8892 
---------------------------------------------------------

As it can be seen, the Hits@10 result is much closer to the ones reported in the paper. It is not the case with MR, but it is not that surprising given the volatility of this metric, I assume that the MMR would be also very close to the original, had the authors used MMR .

As such I deduce that it is propably related to the dataset somehow. Given that FB15K was been reported to suffer from major test leakage through inverse relations.

However i still can't say for sure what is the reason for this disparity between the results of the original paper and this one when it comes to FB15K.

baxtree commented 3 years ago

Oh cool! In that case, we will add WR18 as a canonical dataset for TransE which may benefit other users.

Sounds like the disparity in Hits* is definitely worth some further investigation.

ArkDu commented 3 years ago

Hi @Rodrigo-A-Pereira , One thing that might contribute to the difference in performance is that the implementation here is not exactly the same as the TransE paper proposed. If you look at the algorithm proposed in the paper (page 3, "Algorithm 1 Learning TransE" section), at line 5,e ← e/ ||e|| for each entity e ∈ E, which means that it performs normalization on entities only, and moralizes relations during initialization instead (as line 2 shows). In our implementation, however,

def forward(self, h, r, t):
    """Function to get the embedding value.

       Args:
           h (Tensor): Head entities ids.
           r (Tensor): Relation ids.
           t (Tensor): Tail entity ids.

        Returns:
            Tensors: the scores of evaluationReturns head, relation and tail embedding Tensors.
    """
    h_e, r_e, t_e = self.embed(h, r, t)

    norm_h_e = F.normalize(h_e, p=2, dim=-1)
    norm_r_e = F.normalize(r_e, p=2, dim=-1)
    norm_t_e = F.normalize(t_e, p=2, dim=-1)

    if self.l1_flag:
        return torch.norm(norm_h_e + norm_r_e - norm_t_e, p=1, dim=-1)

    return torch.norm(norm_h_e + norm_r_e - norm_t_e, p=2, dim=-1)`

as the forward function shows, we normalize both entities and relations in the loop. Please check /pykg2vec/models/pairwise.py, (https://github.com/Sujit-O/pykg2vec/blob/master/pykg2vec/models/pairwise.py), TransE section for more detail. We are unsure about how much difference does this difference in implementation make, but it might contribute to a performance different from the numbers shown in the origional paper. We are still investigating the issue, and we will let you know when we have progress. Thank you for reporting the issue!

Best,

Rodrigo-A-Pereira commented 3 years ago

Hi @ArkDu,

Thank you for replying. Yes that seems like a plausible difference that could justify the difference in results.