CurryTang / Graph-LLM

Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
243 stars 25 forks source link

Understanding the splits #13

Closed amangupt01 closed 7 months ago

amangupt01 commented 7 months ago

Thanks for sharing the code base and datasets! We enjoyed reading the paper.

I have some confusion regarding the splits used in the code. Can you please explain the difference between "pl_fixed", "fixed", "pl_random" and "random"? Are they related to the low_label and high_label settings mentioned in the paper?

When we set the low_labels_test argument to 1 with random split it still directs to a 60-20 split of the dataset which is contrary to what is mentioned in the paper. Am I missing something here?

Thanks in Advance!

CurryTang commented 7 months ago

Hi, fixed refers to the low label rate mentioned in the paper (20 labels per class, 'fixed' means usually this is given as the official split) while random refers to 60% setting. pl is the pseudo labels given by the TAPE. low_labels_test is related to the few-shot setting mentioned in the annotation part.