Closed amblee0306 closed 4 years ago
Hello, we used the scoring python code from DeepWalk to evaluate the generated embeddings: https://github.com/phanein/deepwalk
Hi there. thanks for the reply. the scoring function takes a .mat input file as well. can I know how did you generate the .mat file? or is it possible to download it from somewhere for the dblp dataset?
@ranahussein alternatively, can you please make the labels of the nodes available? edited: i just realised the labels are available just that it is named actor_labels.txt instead of author_labels.txt
besides that, do you also have the feature vectors of the author nodes?
Hello, Yes, these are the labeled nodes for the DBLP dataset. We don’t use a feature vector, we only use the resulted embeddings. You can use the authors' labels to generate the sparse mat file (eg. find the uploaded .mat for the dblp) In the scoring python code, we only use the part of the mat file corresponding to mat[‘group’], you need to adjust your code, as not all authors in the graph have labeled data.
@ranahussein thanks for the clarification. Can I know how I can get the feature vectors for the author nodes eventhough you don't use it in your model? I saw that some other dblp datasets have the author feature vectors but I m afraid the sequence of the data is different from what you are using. So I can't utilize the datasets together. I want to compare the embeddings with the given feature vectors. Thanks!
Unfortunately I don’t have this. Maybe you can refer to the original authors of the dataset, you will find their reference in the paper/github.
@ranahussein can I know how did you preprocessed the dblp dataset from the original dataset? I m thinking how to extract exact the 5915 authors nodes that you are using in your dataset.
This was done by the authors of this paper: Huang, Zhipeng, et al. "Meta structure: Computing relevance in large heterogeneous information networks." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. Huang, Zhipeng, and Nikos Mamoulis. "Heterogeneous information network embedding for meta path based proximity." arXiv preprint arXiv:1701.05291 (2017).
Okays. Thanks for the information! :)
Hi @ranahussein , is it possible to obtain the part of the code where you performed the node classification task using the generated embeddings? Thanks :)