eXascaleInfolab / JUST

45 stars 7 forks source link

node classification task #3

Closed amblee0306 closed 4 years ago

amblee0306 commented 4 years ago

Hi @ranahussein , is it possible to obtain the part of the code where you performed the node classification task using the generated embeddings? Thanks :)

ranahussein commented 4 years ago

Hello, we used the scoring python code from DeepWalk to evaluate the generated embeddings: https://github.com/phanein/deepwalk

amblee0306 commented 4 years ago

Hi there. thanks for the reply. the scoring function takes a .mat input file as well. can I know how did you generate the .mat file? or is it possible to download it from somewhere for the dblp dataset?

amblee0306 commented 4 years ago

@ranahussein alternatively, can you please make the labels of the nodes available? edited: i just realised the labels are available just that it is named actor_labels.txt instead of author_labels.txt

besides that, do you also have the feature vectors of the author nodes?

ranahussein commented 4 years ago

Hello, Yes, these are the labeled nodes for the DBLP dataset. We don’t use a feature vector, we only use the resulted embeddings. You can use the authors' labels to generate the sparse mat file (eg. find the uploaded .mat for the dblp) In the scoring python code, we only use the part of the mat file corresponding to mat[‘group’], you need to adjust your code, as not all authors in the graph have labeled data.

amblee0306 commented 4 years ago

@ranahussein thanks for the clarification. Can I know how I can get the feature vectors for the author nodes eventhough you don't use it in your model? I saw that some other dblp datasets have the author feature vectors but I m afraid the sequence of the data is different from what you are using. So I can't utilize the datasets together. I want to compare the embeddings with the given feature vectors. Thanks!

ranahussein commented 4 years ago

Unfortunately I don’t have this. Maybe you can refer to the original authors of the dataset, you will find their reference in the paper/github.

amblee0306 commented 4 years ago

@ranahussein can I know how did you preprocessed the dblp dataset from the original dataset? I m thinking how to extract exact the 5915 authors nodes that you are using in your dataset.

ranahussein commented 4 years ago

This was done by the authors of this paper: Huang, Zhipeng, et al. "Meta structure: Computing relevance in large heterogeneous information networks." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. Huang, Zhipeng, and Nikos Mamoulis. "Heterogeneous information network embedding for meta path based proximity." arXiv preprint arXiv:1701.05291 (2017).

amblee0306 commented 4 years ago

Okays. Thanks for the information! :)