CurryTang / Graph-LLM

Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
243 stars 25 forks source link

Question about cora and cora_full. #16

Open guanfaqian opened 5 months ago

guanfaqian commented 5 months ago

Thanks for sharing the code and datasets! I have some question about cora:

as answer of you in https://github.com/CurryTang/Graph-LLM/issues/7

In cora dataset, I find only title but no content for some entries.

Later, I found that you shared cora_full file in your Google Cloud Drive. I then wanted to use cora_full instead of cora.

But I found that your cora_full file only contains Data(raw_texts=[28402], edge_index=[2, 97230], y=[28402]) and no features. I thought about loading cora_full myself corresponding to the text file you provided. But I realized that the number of nodes doesn't correspond either. You have 28402 here, but the general cora_full is 19793.

May I ask how you got your cora_full.pt file? Or can you provide more full information?

CurryTang commented 5 months ago

Hi, you may access Cora-full dataset here https://people.cs.umass.edu/~mccallum/data.html

jujulili888 commented 3 months ago

Hi, thanks for your sharing! Does the attribute x in the cora_fixed_tfidf.pt file match the attribute x in the official pyg Cora dataset? BTW, I've noticed that there are 10 different sets of train/valid/test masks included in the Cora Data, which one is the main set you used in your experiments?

CurryTang commented 3 months ago

Hi, thanks for your sharing! Does the attribute x in the cora_fixed_tfidf.pt file match the attribute x in the official pyg Cora dataset? BTW, I've noticed that there are 10 different sets of train/valid/test masks included in the Cora Data, which one is the main set you used in your experiments?

It's not aligned since the original one is anonymous