HKUDS / GraphGPT

[SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"
https://arxiv.org/abs/2310.13023
Apache License 2.0
493 stars 36 forks source link

The results of baselines are not reasonable and not correct (Clarified by author) #52

Closed AGTSAAA closed 4 months ago

AGTSAAA commented 4 months ago

For the ogbn-Arxiv supervised setting, the results of GCN, Graphsage, and other GNN methods on ogbn-arxiv should be higher than 70. This is well-known in the community.

image
tjb-tech commented 4 months ago

For the ogbn-Arxiv supervised setting, the results of GCN, Graphsage, and other GNN methods on ogbn-arxiv should be higher than 70. This is well-known in the community.

image

Hi, we have reply this issue in the issue #2 . Please refer to that issue.

AGTSAAA commented 4 months ago

Many others [1,2] have also tried the BERT feature with GNNs on Ogbn-arxiv, but all of them achieved very high performance (72-76), depending on different BERT models. Your report of ~50 is exceptionally low.

[1] SimTG: A Frustratingly Simple Approach For Textual Graph Representation Learning. [2] Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs.

tjb-tech commented 4 months ago

Learning

BERT have many versions of models. We use this version https://huggingface.co/google/bert_uncased_L-2_H-128_A-2. And you could feel free to use our data to run baselines.

AGTSAAA commented 4 months ago

Thank you for your reply.

  1. I still believe that even with the used BERT, the performance would not be so low. I will check this by running the code in [1,2].
  2. If GNNs with other BERT models in [1,2] can outperform GraphGPT, how can we say GraphGPT can beat simple GNNs?

[1] SimTG: A Frustratingly Simple Approach For Textual Graph Representation Learning. [2] Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs.

tjb-tech commented 4 months ago

Thank you for your reply.

  1. I still believe that even with the used BERT, the performance would not be so low. I will check this by running the code in [1,2].
  2. If GNNs with other BERT models in [1,2] can outperform your methods, how can we say GraphGPT can beat simple GNNs?

[1] SimTG: A Frustratingly Simple Approach For Textual Graph Representation Learning. [2] Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs.

We use the same node features processed by the same bert model https://huggingface.co/google/bert_uncased_L-2_H-128_A-2 into baselines and our GraphGPT. So the comparisons are fair and it's only the case of initial node features. The methods you mentioned tends to enhance the node representations use language models or others. The bert models they used may be more powerful than ours. But it's only different data pre-processing and initial experimental settings. You could use our data to run the baselines like GAT, GCN ... to verify our experiments. Btw, the results could further prove that our GraphGPT are not that sensitive to the quality of node features than traditional GNNs.

AGTSAAA commented 4 months ago

This is not the case. Your input for GraphGPT also includes the abstract and title, i.e., raw text .

For fairness, one can actually use any BERT model to embed this raw text for GNN. If we input the raw text into GNN, we can achieve an accuracy higher than 70%

tjb-tech commented 4 months ago

This is not the case. Your input for GraphGPT also includes the abstract and title, i.e., raw text .

For fairness, one can actually use any BERT model to embed this raw text for GNN. If we input the raw text into GNN, we can achieve an accuracy higher than 70%

Thanks for your questions. But traditional GNNs could not directly leverage raw text into them. They could only leverage node features. GraphGPT could leverage the raw text because the superiority of LLMs.

AGTSAAA commented 4 months ago

This is still not the case. Even with simple bag-of-words of text, GNNs can achieve performance higher than 70.

[1] Language is All a Graph Needs

tjb-tech commented 4 months ago

a

Thanks for your questions. We'd like to clarify that we use the same node features into both baseline GNNs and GraphGPT and in this academic experiment, it's the commonly used control variable method to validate the effectiveness of the model architecture. In our experiments, we keep the same data-preprocessing, i.e., the same node features all the time.

AGTSAAA commented 4 months ago

In GraphGPT, you utilize raw text. However, you choose a very weak BERT model for feature extraction for GNN, which is even worse than a bag-of-words of raw text for GNN. This is not reasonable.

tjb-tech commented 4 months ago

is

Thanks for your question! But actually, in Language is All a Graph Needs, it leverage Skip-gram rather than bag-of-words for arxiv.

image

And refer to original OGB website, they run the skip-gram model over the MAG corpus. So skip-gram requires training but we use the pre-trained BERT (that is training-free) for data pre-processing. So to some extent, the features of the OGBN-ARXIV are optimized for MAG corpus.

image

And we clarify again that our experiments are fair and reasonable enough because we use the same node features all the time.

AGTSAAA commented 4 months ago

Thank you for your reply.

But you chose a very weak BERT model for GNN. Could you explain why you didn't choose the BERT methods that were widely used in previous work? My point is that you utilize raw text for GraphGPT. Thus, we can actually utilize any pre-trained BERT model (which are all training-free) for GNN.

image image
tjb-tech commented 4 months ago

Thank you for your reply.

But you chose a very weak BERT model for GNN. Could you explain why you didn't choose the BERT methods that were widely used in previous work? My point is that you utilize raw text for GraphGPT. Thus, we can actually utilize any pre-trained BERT model (which are all training-free) for GNN.

image image

Thanks for your questions! The original reason why we choose this model is very straightforward. Because we want keep the same dimension of our data (i.e. 128) with original ogb data. Most dimensions of features you mentioned above are 768. So the acc are higher is understandable. Also, the motivation of our GraphGPT is different from works you mentioned, our motivation is to explore a paradigm for combining LLMs with graphs rather than how to conduct more powerful node features for graphs