RManLuo / reasoning-on-graphs

Official Implementation of ICLR 2024 paper: "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning"
https://arxiv.org/abs/2310.01061
MIT License
291 stars 31 forks source link

Data path while running the "generate_explanation_results.py" file. #12

Closed meaningful96 closed 1 month ago

meaningful96 commented 1 month ago

Hello, thanks for sharing your work.

I have a question about the process of building interpretable examples. I'm currently preprocessing the dataset and encountered the following error:

(RoG) youminkk@gold:~/Paper/RoG$ python src/joint_training/preprocess_qa.py
[2024-07-16 14:53:18,110] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/youminkk/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Creating json from Arrow format: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 27.54ba/s]
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 78.69it/s]
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 106936.93it/s]
Creating json from Arrow format: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 28.45ba/s]
(RoG) youminkk@gold:~/Paper/RoG$ python src/joint_training/generate_explanation_results.py
[2024-07-16 14:53:39,471] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/youminkk/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/youminkk/Paper/RoG/src/joint_training/generate_explanation_results.py", line 132, in <module>
    train_dataset = datasets.load_dataset(input_file, split="train")
  File "/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/datasets/load.py", line 2594, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/datasets/load.py", line 2266, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/datasets/load.py", line 1916, in dataset_module_factory
    raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /home/youminkk/Paper/RoG/datasets/joint_training/qa/webqsp/webqsp.py or any data file in the same directory.

I checked the code in generate_explanation_results.py and found the section where the data path is specified.

## Line 15 - 19

save_dir = "datasets/joint_training/ExplainQAData"
split="train"
model_max_length = 1024
data_list = ['webqsp', 'cwq']
data_path = "/home/lluo/projects/KIT/data/KGQA"

How should I modify this part? For reference, the preprocessing steps were all completed successfully. Could you kindly let me know the correct path I should modify?

meaningful96 commented 1 month ago

And when I specify the path as follows and run it, another error occurs.

data_list = ['RoG-webqsp', 'RoG-cwq']
data_path = "~/datasets/joint_training/qa"
(RoG) youminkk@gold:~/Paper/RoG$ python src/joint_training/generate_explanation_results.py
[2024-07-16 15:29:26,013] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/youminkk/miniconda3/envs/RoG/lib/python3.10/site-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/youminkk/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Generating train split: 2826 examples [00:00, 125695.69 examples/s]
Processing RoG-webqsp...
Number of process: 1
  0%|                                                                                                                                                                     | 0/1000 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/youminkk/miniconda3/envs/RoG/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/youminkk/Paper/RoG/src/joint_training/generate_explanation_results.py", line 108, in formatting_prompts_func
    output_label = "\n".join(example['answer'])
KeyError: 'answer'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/youminkk/Paper/RoG/src/joint_training/generate_explanation_results.py", line 141, in <module>
    for example in tqdm(pool.imap_unordered(formatting_prompts_func, train_dataset), total=len(train_dataset)):
  File "/home/youminkk/.local/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/home/youminkk/miniconda3/envs/RoG/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
KeyError: 'answer'
RManLuo commented 1 month ago

Thanks for noticing. I have updated the code to use datasets from huggingface 418cd4bc066e8a37e07f360349a8feb0ff920229

meaningful96 commented 1 month ago

@RManLuo Thank you for responding quickly. I just solved the problem. Thanks for your help.

RManLuo commented 1 month ago

Great to hear that. Thanks for your interest in our work.