alibaba / GraphTranslator

GraphTranslator:Aligning Graph Model to Large Language Model for Open-ended Tasks
BSD 3-Clause "New" or "Revised" License
68 stars 12 forks source link
citation-network gnn graph graph-neural-networks graphtranslator large-language-models llm social-network-analysis transformer

GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

Code of our The Web Conference 2024 paper GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

Author: Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, Chuan Shi

Model Pipeline

image-20240129111934589

Installation

We run our experiment with the following settings.

The ./requirements.txt list all Python libraries that GraphTranslator depend on, and you can install using:

conda create -n graphtranslator python=3.9
conda activate graphtranslator
git clone https://github.com/alibaba/GraphTranslator.git
cd GraphTranslator/
pip install -r requirements.txt

Datasets & Models

Download datasets and model checkpoints used in this project with huggingface.

ArXiv Dataset

Download files bert_node_embeddings.pt, graphsage_node_embeddings.pt and titleabs.tsv from link and insert them to ./data/arxiv.

cd ./data/arxiv
git lfs install
git clone git@hf.co:datasets/Hualouz/GraphTranslator-arxiv

Translator Model

Download bert-base-uncased.zip from link and unzip it to ./Translator/models.

cd Translator/models/
git lfs install
git clone git@hf.co:Hualouz/Qformer
unzip bert-base-uncased.zip

ChatGLM2-6B Model

Download the ChatGLM2-6B model from link and insert it to ./Translator/models

cd ./Translator/models
git lfs install
git clone git@hf.co:THUDM/chatglm2-6b

Run

Producer Phase

cd ./Producer/inference
python producer.py

Training Phase

Train the Translator model with the prepared ArXiv dataset.

Train the Translator for GraphModel-Text alignment. The training configurations are in the file ./Translator/train/pretrain_arxiv_stage1.yaml.

cd ./Translator/train
python train.py --cfg-path ./pretrain_arxiv_stage1.yaml

After stage 1, you will get a model checkpoint stored in ./Translator/model_output/pretrain_arxiv_stage1/checkpoint_0.pth.

Train the Translator for GraphModel-LLM alignment. The training configurations are in the file ./Translator/train/pretrain_arxiv_stage2.yaml.

cd ./Translator/train
python train.py --cfg-path ./pretrain_arxiv_stage2.yaml

After stage 2, you will get a model checkpoint stored in ./Translator/model_output/pretrain_arxiv_stage2/checkpoint_0.pth.

After all the training stages , you will get a model checkpoint that can translate GraphModel information into that the LLM can understand.

Generate Phase

cd ./Translator/train
python generate.py

The generated prediction results will be saved in ./data/arxiv/pred.txt.

Evaluation

Evaluate the accuracy of the generated predictions.

cd ./Evaluate
python eval.py

Citation

@inproceedings{zhang2024graphtranslator,
  title={GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks},
  author={Zhang, Mengmei and Sun, Mingwei and Wang, Peng and Fan, Shen and Mo, Yanhu and Xu, Xiaoxiao and Liu, Hong and Yang, Cheng and Shi, Chuan},
  booktitle={Proceedings of the ACM on Web Conference 2024},
  pages={1003--1014},
  year={2024}
}

Acknowledgements

Thanks to all the previous works that we used and that inspired us.