HKUDS / GraphGPT

[SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"
https://arxiv.org/abs/2310.13023
Apache License 2.0
493 stars 36 forks source link

Ask for the construction of datasets #42

Closed linwhitehat closed 4 months ago

linwhitehat commented 6 months ago

Hi, I'm interested in this work but I'm still not sure about the dataset construction so I would like to ask some questions about the dataset for the experimental phase. I would like to understand how the data forms of id, conversations and graph are constructed when constructing the dataset for stage1, stage2 and evaluation. I found that there is a difference in the data form used for stage1, stage2 and evaluation, is it possible to interpret the data form for the data samples of these three stages?

For example the following three samples.

  1. the sample from Jiabin99/GraphGPT-eval-instruction (evaluation) image

  2. the sample from Jiabin99/Arxiv-PubMed-mix-NC-LP (stage2) image

  3. the sample from Jiabin99/graph-matching (stage1) image

linwhitehat commented 5 months ago

This is great work and I am interested in it! But when I want to use gpt-3.5 to generate the answer based on the graph-match example, I found that the input of the graph does not contain the title and summary information in the node as stated in the prompt, I hope the author will help me to solve how to generate the answer using gpt?

linwhitehat commented 5 months ago

Forgive me if I am not very familiar with this dataset and would like to ask how you constructed all_pyg_graph_data.pt? The work is really interesting and also I would like to know if this work can be applied to graph level categorization as well?

tjb-tech commented 4 months ago

This is great work and I am interested in it! But when I want to use gpt-3.5 to generate the answer based on the graph-match example, I found that the input of the graph does not contain the title and summary information in the node as stated in the prompt, I hope the author will help me to solve how to generate the answer using gpt?

Hi, sorry for the late reply due to several conference ddl last few months. Actually, we only use CoT distillation to generate the instruction in the stage 2 (downstream tasks). So you don't need to let GPT-3.5 to conduct answer on graph-matching (stage 1).

tjb-tech commented 4 months ago

Forgive me if I am not very familiar with this dataset and would like to ask how you constructed all_pyg_graph_data.pt? The work is really interesting and also I would like to know if this work can be applied to graph level categorization as well?

Thanks for your interests! You can put all pyg Data in a dict and save the dict into the .pt format. As for the graph level categorization, you can try it!