Closed ghost closed 2 years ago
Hi,
Thanks for your interest! Can you please elaborate more about your question? Are you using GraphZoom to obtain the node embeddings? Which dataset are you using? In addition, can you talk more about how do you compute l2 distance for node classification/link prediction?
Thanks, Chenhui
Hi,
Thanks for your quick reply. I embed yelp data from 'Accelerated Attributed NetworkEmbedding' by graphzoom with graphsage+lamg. After getting each node embedding, I will compute L2 distance pair by pair and generate the kernel matrix for node classification and link prediction.
Thanks Wei
We have never used l2 distance and the kernel matrix that you mentioned for the label prediction. Have you tried "Logistic Regression" for label prediction after obtaining node embeddings? If the accuracy of using Logistic Regression is still low, can you also show me the table that lamg produces during coarsening?
Btw, You can also tune some hyperparameters in graphsage to improve the accuracy (e.g., max_total_steps, hidden dimension, and learning rate).
Hi,
As you suggested, I tried to classify the data, m10, via logistic regression and compute the micro_f1 and macro_f1. They are still low. Could you give me any suggestions pls? I have attach the table.
Thanks! Wei
Hi,
The table you showed is the fusion step. Can you please also show me the LAMG table during the coarsening phase? What is the reduction ratio you are using for coarsening? Btw, what's the f1 score you get with GraphZoom and what is the f1 score of baselines (e.g., GraphSAGE)?
Do you mean the walk in the attached picture?
Reduction ratio is 2.
The results are as follows, compared with GraphSAGE
Thanks very much! Wei
Thanks for the information. For the GraphSAGE baseline, are you using the same code in GraphZoom by disabling fusion, coarsening, and refinement? If so, would you mind sharing this dataset with me to check?
No, the code of graphsage comes from stellargraph https://stellargraph.readthedocs.io/en/stable/
链接: https://pan.baidu.com/s/1mfk9KnYR6HtjzC0-JSKBXw 提取码: 2gb2
you can use to_npy.py to generate m10-feats.npy because it is large.
Note that, in m10 dataset, only the first 10310 nodes are labelled. Therefore, we conduct network embedding on the whole network including all the nodes, but classify the first 10310 labeled nodes.
Thanks very much.
Wei
I see. In this case I think the main reason may be that the default hyperparameters of GraphSAGE in our code is not as good as the GraphSAGE code you used from stellargraph for the Yelp dataset. I would suggest you to integrate the GraphSAGE model from stellargraph (including the same hyperparameters) into GraphZoom and then evaluate the f1 score for a fair comparison. If this new GraphZoom+GraphSAGE still has low f1, I can help to run your dataset.
Could you give me the key parameters pls? I will try to adjust the key parameters in priority.
Thanks Wei
I would suggest you to tune "max_total_steps", "learning rate", "neg_sample_size", and "hidden dimension". Btw, you are using the unsupervised version (instead of the supervised version) of GraphSAGE from stellargraph, right?
OMG, I used the supervised version of stellargraph. This is the key reason. But another learning-based algorithm, not GNN, can achieve over 0.7 f1 score, which still outperforms graphzoom.
Thanks for your great time and help. Wei
Good to know you find the reason. Note that GraphZoom is just a framework and you can plug in whatever graph embedding model you want. If there is another learning-based algorithm outperforming our GraphZoom+GraphSAGE, then you can plug that algorithm into GraphZoom, which should also achieve higher accuracy.
Dear Zhang,
I am now running the experiments from your code. I get the embedding representation and then compute l2 distance as the kernel for node classification and link prediction, but it is poor performance. Could you recommend the embedding space pls?
Thanks Wei