XiaoxinHe / TAPE

Official Implementation of ICLR 2024 paper "Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning"
https://arxiv.org/abs/2305.19523
MIT License
188 stars 29 forks source link

Missing .ps file for Cora Dataset #5

Closed ChenS676 closed 1 year ago

ChenS676 commented 1 year ago

Dear Mr. He,

Firstly, I'd like to express my gratitude for sharing the code related to your remarkable research. I've encountered an issue during the reconstruction process. Specifically, I couldn't locate certain papers in the "extraction" folder.

Could you please let me know if there's an updated version of the Cora dataset available for public use? Alternatively, I might have misunderstood the instructions provided in the README.md.

To replicate the issue I faced, you can run the following code:

if __name__ == '__main__':
    data, data_citeid = get_cora_casestudy()
    data, text = get_raw_text_cora(use_text=True)
    print(data)
    print(data_citeid)

Executing the above should help you reproduce the error I encountered.

Thank you for your time and assistance.

Best regards, csh

XiaoxinHe commented 1 year ago

Hi,

To resolve this, you'll need to download the dataset first. Here's how you can do it for the cora dataset:

Download the cora dataset from this link. After downloading, unzip the dataset. Move the unzipped dataset folder to the following directory: dataset/cora_orig. For more detailed instructions, you can refer to "1. Download TAG datasets" in the readme file of the project.

This should hopefully resolve the data loading issue. If you encounter any further problems, please don't hesitate to ask for assistance.

Thanks!

ChenRunjin commented 1 year ago

Hi, I also met this problem, I've followed the instructions above, download the datasets and move it to the right position. But the problem still exists, it seems that we need to replace ':' to '_' of fn

ChenS676 commented 1 year ago

I think XiaoxiHe answered before in this Email, " Thank you for the detailed description of the issue.

I would like to clarify that our dataset is available in two distinct variations: one for the original text attributes(i.e., title and abstract), another for the LLM responses (i.e., prediction and explanation). [link1] cora + original text attributes: https://drive.google.com/file/d/1oo6EbCjrwOabjjudT5LGx75Ks9_HBAMs/view [link2] cora + LLM responses: https://drive.google.com/file/d/1tSepgcztiNNth4kkSR-jyGkNnN7QDYax/view Based on your description, it appears that you are attempting to load the dataset with the original text attributes. If this is indeed the case, please make sure to use link1 for the download rather than link2.

" at least it works for me.

to be honest three datasets sounds not enough to prove the concept, if you have found other text-attributed graph, let me know Mr/Ms. ChenRunJin?

ZhuYun97 commented 1 year ago

Hi, I also met this problem, I've followed the instructions above, download the datasets and move it to the right position. But the problem still exists, it seems that we need to replace ':' to '_' of fn

For Cora dataset, there are three special cases that typically require specific processing steps when loading original texts:

fn = fn.replace(':', '_')
if fn == 'http_##www.cs.ucc.ie#~dgb#papers#ICCBR2.ps.Z':
    fn = 'http_##www.cs.ucc.ie#~dgb#papers#iccbr2.ps.Z'
if fn == 'http_##www.cs.ucl.ac.uk#staff#t.yu#ep97.ps':
    fn = 'http_##www.cs.ucl.ac.uk#staff#T.Yu#ep97.ps'
if fn == 'http_##www.cs.ucl.ac.uk#staff#t.yu#pgp.new.ps':
    fn = 'http_##www.cs.ucl.ac.uk#staff#T.Yu#pgp.new.ps'

Adding these codes after this line.

XiaoxinHe commented 1 year ago

Hi,

I've uploaded an updated version of the cora dataset that should resolve the issue. You can now download it with the corrected filename format. Let me know if you encounter any more problems.