Question about RGCN example.

Derek-Wds commented 3 years ago

❓ Questions and Help

Hi, I have two small questions regarding to the implementation of Pytorch RGCN.

It seems that in the link_prediction.py file, the train_data is used to build the test graph. (Reference line here.)
During validation phase, it seems that test_data is used to validate model performance and calculate metrics such as filtered MRR and raw MRR. (Reference codes here and here). Does this suggest that we are picking the best model on the test?

I feel like these two details might be related to the filtered MRR metric computation, but are they still valid for raw MRR? Can anyone explain it a bit to me? I'm really puzzled here. Thanks in advance!

Rhett-Ying commented 3 years ago

we just refer to Code: https://github.com/MichSchli/RelationPrediction. could you turn to the original author for more info?

Derek-Wds commented 3 years ago

Thanks for our reply. I do refer to the original code from author. It seems he used the source data to split train, validation and test and use the validation data to evaluate the model. References code here and here.

Since we can directly obtain train, validation and test respectively from the dataset, I suppose we should use them to build the valid_graph and test_graph directly, and evaluate/test the model accordingly.

Rhett-Ying commented 3 years ago

have u tried with the dataset split as u said? any difference between current split strategy?

Derek-Wds commented 3 years ago

There is no much difference regarding to the performance of model on other link prediction dataset. I just wonder if it would be better to use real validation and test dataset instead.

Rhett-Ying commented 3 years ago

do you mean replace train_data by valid_data to generate valid_graph for validation in https://github.com/dmlc/dgl/blob/7c7b60be18cc77793aa7ee1815c0993e23e24c1e/examples/pytorch/rgcn/link_predict.py#L115? so does test_data.

Derek-Wds commented 3 years ago

Yes, exactly.

Rhett-Ying commented 3 years ago

and u have already tried in this way? no much difference?

Derek-Wds commented 3 years ago

I have done experiments on the ICEWS18 and GDELT dataset for link prediction. It seems the performance is similar to those presented in some of the papers.

If time permits, I could try to run the experiments on those dataset presented in the original RGCN paper.

Rhett-Ying commented 3 years ago

so according to your experiment results obtained for now, there's no much difference between test_graph generated from train_data(which is the way we used in DGL example) and test_graph generated from test_data(which is the way u wonder could be better)? But generating test_graph from train_data is a bit unreasonable to you?

Derek-Wds commented 3 years ago

I haven't done experiments for comparing models' performance on the one in DGL implementation and the one usestest_graph generated from test_data. What I did is try to use the real test data (test graph generated test data) and compare its performance on link prediction dataset with the RGCN performance in some of the papers.

Those papers claim that they use DGL's implementation for running the RGCN baselines, but I'm not sure if they modify anything or they follow the exact same implementations as the example. From the results I got, it seems the performances are similar and there are no much differences.

I might not explain things clearly above and sorry for the misunderstanding.

Rhett-Ying commented 3 years ago

so could you compare the DGL's test_graph and real test_graph(generated from test_data instead of train_data)? If there's difference, impl of DGL example is likely to be incorrect. If not, it looks a bit weird as it does not make much sense. How do you think about this?

Derek-Wds commented 3 years ago

Yeah, sure. I could do that in the spared time this week and will get back to this thread when I have results. Thanks!

Derek-Wds commented 3 years ago

Hi, sorry for the late update. I have ran the experiments and got the following results:	Dataset	Model	MRR	Hits1	Hits3
FB15k-237	Paper	15.6	15.1	26.4	41.7
FB15k-237	DGL orig	15.73	9.62	15.90	27.80
FB15k-237	DGL real test	18.35	11.35	18.94	31.89
WN18	Paper	56.1	69.7	92.9	96.4
WN18	DGL orig	53.63	39.69	62.71	79.64
WN18	DGL real test	73.08	65.65	78.27	86.19

The raw MRR metric is used in the experiment. DGL orig represents the original DGL example implementation while DGL real test is the one has the real validation and test data for evaluation.

VoVAllen commented 3 years ago

Hi, we take a look on the original code again. Actually using train_data is expected, because it's the link prediction task instead of the entity classification. For the link prediction task, you need to utilize the training graph to predict the other edges in test sets. Therefore it's passed to the forward function.

However, we found the naming is actually very confusing in our code, and we are planning to rewrite the whole example recently.

Derek-Wds commented 3 years ago

Thanks for the clarification. It makes more sense now.

dmlc / dgl

Question about RGCN example. #3401

❓ Questions and Help