How to map the IDs in your released dataset to the original IDs?

VITA-Group / LLaGA

[ICML2024] "LLaGA: Large Language and Graph Assistant", Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang

Apache License 2.0

81 stars 3 forks source link

How to map the IDs in your released dataset to the original IDs? #7

Closed twelfth-star closed 7 months ago

twelfth-star commented 7 months ago

Thank you for your excellent work.

I am trying to test some baseline models on the same data split as you. However, I noticed that you used a set of new IDs in the data files you published, such as in the file edge_sampled_2_10_only_test.jsonl. I would like to know how I can get the correspondence between the IDs you used and the original IDs in the dataset. For example, for the PubMed dataset, how can I map your IDs to the original PMIDs?

ChenRunjin commented 7 months ago

Hi, we follow Graph-LLM to process our data, you can refer to https://github.com/CurryTang/Graph-LLM/blob/master/data.py parse_pubmed() function to understand how to map original IDs with our ids.

twelfth-star commented 7 months ago

I see. Thank you for your kind reply!