Inquiry about the embedding space

cornell-zhang / GraphZoom

GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding

BSD 3-Clause "New" or "Revised" License

113 stars 15 forks source link

Inquiry about the embedding space #17

Closed ghost closed 2 years ago

ghost commented 3 years ago

Dear Zhang,

I am now running the experiments from your code. I get the embedding representation and then compute l2 distance as the kernel for node classification and link prediction, but it is poor performance. Could you recommend the embedding space pls?

Thanks Wei

Chenhui1016 commented 3 years ago

Hi,

Thanks for your interest! Can you please elaborate more about your question? Are you using GraphZoom to obtain the node embeddings? Which dataset are you using? In addition, can you talk more about how do you compute l2 distance for node classification/link prediction?

Thanks, Chenhui

ghost commented 3 years ago

Hi,

Thanks for your quick reply. I embed yelp data from 'Accelerated Attributed NetworkEmbedding' by graphzoom with graphsage+lamg. After getting each node embedding, I will compute L2 distance pair by pair and generate the kernel matrix for node classification and link prediction.

Thanks Wei

Chenhui1016 commented 3 years ago

We have never used l2 distance and the kernel matrix that you mentioned for the label prediction. Have you tried "Logistic Regression" for label prediction after obtaining node embeddings? If the accuracy of using Logistic Regression is still low, can you also show me the table that lamg produces during coarsening?

Btw, You can also tune some hyperparameters in graphsage to improve the accuracy (e.g., max_total_steps, hidden dimension, and learning rate).

williamweiwu commented 3 years ago

Hi,

As you suggested, I tried to classify the data, m10, via logistic regression and compute the micro_f1 and macro_f1. They are still low. Could you give me any suggestions pls? I have attach the table.

Thanks! Wei

Chenhui1016 commented 3 years ago

Hi,

The table you showed is the fusion step. Can you please also show me the LAMG table during the coarsening phase? What is the reduction ratio you are using for coarsening? Btw, what's the f1 score you get with GraphZoom and what is the f1 score of baselines (e.g., GraphSAGE)?

williamweiwu commented 3 years ago

Do you mean the walk in the attached picture?

Reduction ratio is 2.

The results are as follows, compared with GraphSAGE

Thanks very much! Wei

Chenhui1016 commented 3 years ago

Thanks for the information. For the GraphSAGE baseline, are you using the same code in GraphZoom by disabling fusion, coarsening, and refinement? If so, would you mind sharing this dataset with me to check?

williamweiwu commented 3 years ago

No, the code of graphsage comes from stellargraph https://stellargraph.readthedocs.io/en/stable/

链接: https://pan.baidu.com/s/1mfk9KnYR6HtjzC0-JSKBXw 提取码: 2gb2

you can use to_npy.py to generate m10-feats.npy because it is large.

Note that, in m10 dataset, only the first 10310 nodes are labelled. Therefore, we conduct network embedding on the whole network including all the nodes, but classify the first 10310 labeled nodes.

Thanks very much.

Wei

Chenhui1016 commented 3 years ago

I see. In this case I think the main reason may be that the default hyperparameters of GraphSAGE in our code is not as good as the GraphSAGE code you used from stellargraph for the Yelp dataset. I would suggest you to integrate the GraphSAGE model from stellargraph (including the same hyperparameters) into GraphZoom and then evaluate the f1 score for a fair comparison. If this new GraphZoom+GraphSAGE still has low f1, I can help to run your dataset.

williamweiwu commented 3 years ago

Could you give me the key parameters pls? I will try to adjust the key parameters in priority.

Thanks Wei

Chenhui1016 commented 3 years ago

I would suggest you to tune "max_total_steps", "learning rate", "neg_sample_size", and "hidden dimension". Btw, you are using the unsupervised version (instead of the supervised version) of GraphSAGE from stellargraph, right?

williamweiwu commented 3 years ago

OMG, I used the supervised version of stellargraph. This is the key reason. But another learning-based algorithm, not GNN, can achieve over 0.7 f1 score, which still outperforms graphzoom.

Thanks for your great time and help. Wei

Chenhui1016 commented 3 years ago

Good to know you find the reason. Note that GraphZoom is just a framework and you can plug in whatever graph embedding model you want. If there is another learning-based algorithm outperforming our GraphZoom+GraphSAGE, then you can plug that algorithm into GraphZoom, which should also achieve higher accuracy.