HKUDS / GraphGPT

[SIGIR'2024] "GraphGPT: Graph Instruction Tuning for Large Language Models"
https://arxiv.org/abs/2310.13023
Apache License 2.0
493 stars 36 forks source link

Training data issues #73

Open zhuochunli opened 1 month ago

zhuochunli commented 1 month ago

I found that in stage 1 and stage 2, the training data lacked benchmarks. Can you help explain this? Thank you!

graph_matching.json in stage 1: Counter({'Industrial': 84003, 'arxiv': 74075, 'cora': 25120})

Arxiv-PubMed-mix-NC-LP.json in stage 2: Counter({'arxiv': 94441, 'pubmed': 73660})

What is the benchmark "Industrial"? I didn't see the name anywhere in the paper. And why graph_matching.json lacks PubMed but Arxiv-PubMed-mix-NC-LP.json doesn't have Cora?