YuxiangRen / Heterogeneous-Deep-Graph-Infomax

HDGI code
59 stars 14 forks source link

The initial features #9

Open 960924 opened 4 years ago

960924 commented 4 years ago

Can you tell me in detail how to get the initial features in DBLP? Thank you very much if possible.

YuxiangRen commented 4 years ago

The DBLP dataset used in this paper is original from ‘Graph-based consensus maximization among multiple supervised and unsupervised models’, but we utilize the extended version from the paper "Heterogeneous Graph Attention Network" (HAN). In this version, the author has some keywords which constitute the profile. You can get this dataset from the code of HAN. I can also provide you with the dataset.

Because of the size limitation of Github, I put the DBLP dataset in https://drive.google.com/open?id=1zdF3KGp0sk3ZatEvrF6_QHTuxK40-fz8

Author features are the elements of a bag-of-words represented of keywords. The size of the key words vocabulary is 334.

960924 commented 4 years ago

Thank you very much for your reply. I am still wondering whether you have analyzed the reason why the clustering results of IMDB are far lower than other datasets. In addition, if I want to remove the "Conference" nodes in DBLP, can I just not use "APCPA"? Do Author features need to be changed?

YuxiangRen commented 4 years ago

I think the performance depends on the dataset along with the label chosen to evaluate the clustering result. I think the author feature has no need to change when you drop conference nodes.

960924 commented 4 years ago

I'm sorry to ask you another question. Could you please provide the code to extract the features in IMDB? It would be great if you could.

960924 commented 4 years ago

Hello, didn't you divide the dataset when you were training the model? Does this affect performance? Looking forward to your reply