Questions on the performance on three datasets

wysqh commented 4 years ago

Hi, it is quite a solid work, and thanks for your open-source codes and it does give me quite a lot of intuitions. However, I also encounter some problems under the codes. Specifically, I reimplement the GATNE-T model on my own, and I run the original model and my reimplemented model under three datasets (i.e. Amazon, YouTube and Twitter). The experiment results (ROC-AUC, PR-AUC, F1) are basically the same in terms of Amazon, but there seems to be a large gap in terms of YouTube and Twitter datasets (e.g. Twitter with ROC-AUC to be around 0.74, PR-AUC to be around 0.78 and F1 to be around 0.64). This phenomenon appears both in the original model and my reimplemented model. I also find that in Twitter test.txt, as there is no type '3' edge, it will cause the key error in the pubshlished codes.

Could you give me some hints on the gaps of the evaluation metrics w.r.t. the YouTube and Twitter datasets? (e.g. lr, weight decay or something else) I would be very appreciated of your kind and quick reply.

cenyk1230 commented 4 years ago

Hi @wysqh,

Sorry for the late reply. For the Twitter dataset, we only evaluate the performance on the first edge type by setting --eval-type 1, since it has the majority edges. You can see the meaning of each edge type from this page: https://snap.stanford.edu/data/higgs-twitter.html. Could you please report your results for the YouTube dataset?

wysqh commented 4 years ago

Hi, @cenyk1230 Thanks for your kind reply. The performance on the YouTube dataset is nearly the same as the paper reported. It is my own mistake that I set the wrong dataset. For the twitter dataset, I get the similar performances under your instructions.

THUDM / GATNE

Questions on the performance on three datasets #64