Closed Derek-Wds closed 3 years ago
we just refer to Code: https://github.com/MichSchli/RelationPrediction. could you turn to the original author for more info?
Thanks for our reply. I do refer to the original code from author. It seems he used the source data to split train, validation and test and use the validation data to evaluate the model. References code here and here.
Since we can directly obtain train, validation and test respectively from the dataset, I suppose we should use them to build the valid_graph
and test_graph
directly, and evaluate/test the model accordingly.
have u tried with the dataset split as u said? any difference between current split strategy?
There is no much difference regarding to the performance of model on other link prediction dataset. I just wonder if it would be better to use real validation and test dataset instead.
do you mean replace train_data
by valid_data
to generate valid_graph
for validation in https://github.com/dmlc/dgl/blob/7c7b60be18cc77793aa7ee1815c0993e23e24c1e/examples/pytorch/rgcn/link_predict.py#L115? so does test_data
.
Yes, exactly.
and u have already tried in this way? no much difference?
I have done experiments on the ICEWS18 and GDELT dataset for link prediction. It seems the performance is similar to those presented in some of the papers.
If time permits, I could try to run the experiments on those dataset presented in the original RGCN paper.
so according to your experiment results obtained for now, there's no much difference between test_graph generated from train_data(which is the way we used in DGL example)
and test_graph generated from test_data(which is the way u wonder could be better)
? But generating test_graph from train_data is a bit unreasonable to you?
I haven't done experiments for comparing models' performance on the one in DGL implementation and the one usestest_graph
generated from test_data
. What I did is try to use the real test data (test graph generated test data) and compare its performance on link prediction dataset with the RGCN performance in some of the papers.
Those papers claim that they use DGL's implementation for running the RGCN baselines, but I'm not sure if they modify anything or they follow the exact same implementations as the example. From the results I got, it seems the performances are similar and there are no much differences.
I might not explain things clearly above and sorry for the misunderstanding.
so could you compare the DGL's test_graph and real test_graph(generated from test_data instead of train_data)? If there's difference, impl of DGL example is likely to be incorrect. If not, it looks a bit weird as it does not make much sense. How do you think about this?
Yeah, sure. I could do that in the spared time this week and will get back to this thread when I have results. Thanks!
Hi, sorry for the late update. I have ran the experiments and got the following results: | Dataset | Model | MRR | Hits1 | Hits3 | Hits10 |
---|---|---|---|---|---|---|
FB15k-237 | Paper | 15.6 | 15.1 | 26.4 | 41.7 | |
FB15k-237 | DGL orig | 15.73 | 9.62 | 15.90 | 27.80 | |
FB15k-237 | DGL real test | 18.35 | 11.35 | 18.94 | 31.89 | |
WN18 | Paper | 56.1 | 69.7 | 92.9 | 96.4 | |
WN18 | DGL orig | 53.63 | 39.69 | 62.71 | 79.64 | |
WN18 | DGL real test | 73.08 | 65.65 | 78.27 | 86.19 |
The raw MRR metric is used in the experiment. DGL orig
represents the original DGL example implementation while DGL real test
is the one has the real validation and test data for evaluation.
Hi, we take a look on the original code again. Actually using train_data is expected, because it's the link prediction task instead of the entity classification. For the link prediction task, you need to utilize the training graph to predict the other edges in test sets. Therefore it's passed to the forward function.
However, we found the naming is actually very confusing in our code, and we are planning to rewrite the whole example recently.
Thanks for the clarification. It makes more sense now.
❓ Questions and Help
Hi, I have two small questions regarding to the implementation of Pytorch RGCN.
link_prediction.py
file, thetrain_data
is used to build the test graph. (Reference line here.)test_data
is used to validate model performance and calculate metrics such as filtered MRR and raw MRR. (Reference codes here and here). Does this suggest that we are picking the best model on the test?I feel like these two details might be related to the filtered MRR metric computation, but are they still valid for raw MRR? Can anyone explain it a bit to me? I'm really puzzled here. Thanks in advance!