about the experiment - Githubissues

YuxiangRen / Heterogeneous-Deep-Graph-Infomax

HDGI code

59 stars 14 forks source link

about the experiment #2

Closed DannyWu1996 closed 4 years ago

DannyWu1996 commented 4 years ago

Great respect for your work. I just have a question about the result of the HAN model. Did you just use the TensorFlow version of HAN implemented by the author? Because I can't reproduce the results he mentioned in his paper. So any suggestion?

YuxiangRen commented 4 years ago

I implement the HAN with Pytorch instead of using the Tensorflow version from the authors. The results reported in my paper are based on my own code and also have some deviation compared with the original results.

DannyWu1996 commented 4 years ago

good to know that, so did you follow the hyper-parameter from the original paper or you just search for the best model by yourself?

YuxiangRen commented 4 years ago

I follow the hyper-parameter based on the original paper, but the initial feature extraction may be different. I keep the setting fair for all my comparison methods and ignore the absolute values of HAN.

DannyWu1996 commented 4 years ago

okay, last question, (I hope is not bothering you, lol) In your experiment, you mentioned that you used 20% and 80% data as two different training sets. so my last question is: for the supervised model like GCN, GAT, the result in your paper (node classification) is the direct output of the model from the end-to-end framework? Because what in HAN's paper is kinda confusing, so I just try to figure out the difference between yours and HAN's.

YuxiangRen commented 4 years ago

I am glad to answer your questions. For the supervised methods, the results directly come from the end-to-end framework. But our framework is an unsupervised method to learn the representations and we train a MLP classifier with 20% and 80% training data to conduct the classification tasks respectively. If you have any other questions, don't hesitate to contact me.

DannyWu1996 commented 4 years ago

hahaha 刚看了一下学长还是南大的，正好我也在做这个也是基于HAN做一个对比，以后fo一下学长的工作！感谢今天的答疑~

YuxiangRen commented 4 years ago

客气啦！

DannyWu1996 commented 4 years ago

学长，问一下你在做几个端到端模型GCN GAT HAN时候，训练用20%和80%时，他们的valiadation set的比例是多少，都是10%么？然后用剩下的部分作为test set？

YuxiangRen commented 4 years ago

我是把节点一共分成了10 folds，1 fold用作valiadation set，1 fold用作test set。剩下的用作训练集，20%就是从这8 folds中选2 folds作为训练集，是总数据量的20%。

YuxiangRen commented 4 years ago

要保证结果公平的话，一般要保证validation set 和 test set大小不变，否则性能的变化不一定是训练集增加带来的，也可能是测试集减小带来的。

DannyWu1996 commented 4 years ago

好的！感谢学长明白啦~