Closed twelfth-star closed 6 months ago
Hi, we use off-the-shelf the embedding from SimTeg repo: https://github.com/vermouthdky/SimTeG (arxiv+products). For the Cora and PubMed datasets, we get the embedding by ourselves following their code, as embeddings were not provided for these smaller datasets. Unfortunately, we didn't save the checkpoints for the SimTeG model; however, given the relatively small size of these datasets, the Simteg training process didn't take much time.
If you prefer not to use the SimTEG embedding, we also offer sbert/roberta embedding, which is got by 'all-MiniLM-L6-v2'
and sentence-transformers/all-roberta-large-v1
model.
To get the general model for new embedding, you may train LLaGA with the following command:
./scripts/train_deepspeed.sh vicuna nc-lp-nd arxiv-products-pubmed-cora.3 4 {embedding}
./scripts/train_deepspeed.sh vicuna_4hop nc-lp-nd arxiv-products-pubmed-cora.3 4 {embedding}
When using the HO template, you may need to perform message passing on the new embeddings. This can be achieved using the generate_multi_hop_x
function found in utils/data_process.py
.(Please pull newest master)
Thank you for your great reply! It is so kind of you. I am wondering whether you can release the checkpoints of the LLaGA model that is trained with embeddings like sbert or roberta. I noticed that you have done experiments with them in the Appendix C.
In our paper, we only train sbert/roberta model in one setting (classification expert + HO template) to demonstrate the flexibility of our text encoding methods. If this setting align with your need, feel free to utilize our model:
I just uploaded our sbert/roberta model to hf: Runjin/llaga-vicuna-7b-sbert-HO-classification_expert-linear-projector Runjin/llaga-vicuna-7b-roberta-HO-classification_expert-linear-projector
Note that if you want to apply to other datasets and do zero-shot on node classification task, the node description task may be very helpful (as shown in appendix B). But I think classification expert can do zero-shot on link prediction task.
Got it. Thank you so much for your great work and kind help!
I noticed that the models you released all use SimTeg embeddings. I am wondering whether you trained a SimTeg model yourselves or you just used the embeddings released by the authors of SimTeg. I would appreciate it a lot if you can release the checkpoint of SimTeg model if you trained it by yourselves. If not, then could you please tell me how I can get the SimTeg embeddings of other dataset?
Thank you!