LechengKong / OneForAll

A fundational graph learning framework that solves cross-domain/cross-task classification problems using one model.
MIT License
161 stars 22 forks source link

A question about the data. #7

Closed cy623 closed 7 months ago

cy623 commented 8 months ago

Is the 'Data(raw_text=[2708], y=[2708], label_names=[7], edge_index=[2, 10858], train_masks=[10], val_masks=[10], test_masks=[10], x=[2708, 384], raw_texts=[2708], category_names=[2708])' output in the .pt files from /data the raw data? It seems like the 'x' has undergone some processing. Is it normalized? Thanks!

LechengKong commented 8 months ago

Hi @cy623 , yes, cora.pt and pubmed.pt are the raw files. However we only use the raw text feature in 'raw_texts'. For the values in 'x', you can refer to this paper for details.

cy623 commented 8 months ago

I see, so it's the representation obtained from the raw text through a LLM? (x_i = LLM(s_vi)) Thank you very much for your response.