Babelscape / rebel

REBEL is a seq2seq model that simplifies Relation Extraction (EMNLP 2021).
505 stars 73 forks source link

Error while executing conl dataset #69

Closed mayanku closed 1 year ago

mayanku commented 1 year ago

I am getting this error while executing bart model. I have checked multiple times and my data is kept at correct place whaat could be the reason for this?

test_rebel_mayank) anish@dellr7525:~/mayank_code/rebel-new/rebel/src$ python train.py model=default_model data=conll04_data train=conll04_train Global seed set to 42 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Downloading and preparing dataset conll04_typed/default to /home/anish/.cache/huggingface/datasets/conll04_typed/default-af2570c928fedc0e/0.0.0/87090529d4f9584f9643d0dff3797eec01bcdb028753cb95de9e0445c48d8b32... Generating train split: 0 examples [00:00, ? examples/s][2023-09-09 16:35:06,440][root][INFO] - generating examples from = [PosixPath('/home/anish/mayank_code/rebel-new/rebel/data/datasets/conll04/conll04_train.json')] Traceback (most recent call last):
File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/builder.py", line 1629, in _prepare_split_single for key, record in generator: File "/home/anish/.cache/huggingface/modules/datasets_modules/datasets/conll04_typed/87090529d4f9584f9643d0dff3797eec01bcdb028753cb95de9e0445c48d8b32/conll04_typed.py", line 103, in _generate_examples with open(filepath) as json_file: File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/streaming.py", line 71, in wrapper return function(*args, use_auth_token=use_auth_token, *kwargs) File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/download/streaming_download_manager.py", line 493, in xopen return open(main_hop, mode, args, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: "[PosixPath('/home/anish/mayank_code/rebel-new/rebel/data/datasets/conll04/conll04_train.json')]"

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "train.py", line 107, in main train(conf) File "train.py", line 55, in train pl_data_module = BasePLDataModule(conf, tokenizer, model) File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/pytorch_lightning/core/datamodule.py", line 49, in call obj = type.call(cls, *args, kwargs) File "/home/anish/mayank_code/rebel-new/rebel/src/pl_data_modules.py", line 68, in init self.datasets = load_dataset(conf.dataset_name, data_files={'train': conf.train_file, 'dev': conf.validation_file, 'test': conf.test_file}) File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/load.py", line 1815, in load_dataset storage_options=storage_options, File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/builder.py", line 913, in download_and_prepare download_and_prepare_kwargs, File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/builder.py", line 1675, in _download_and_prepare prepare_splits_kwargs, File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/builder.py", line 1004, in _download_and_prepare self._prepare_split(split_generator, prepare_split_kwargs) File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/builder.py", line 1509, in _prepare_split gen_kwargs=gen_kwargs, job_id=job_id, **_prepare_split_args File "/home/anish/miniconda3/envs/test_rebel_mayank/lib/python3.7/site-packages/datasets/builder.py", line 1665, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

tangxuemei1995 commented 1 year ago

I also meet this bug, I change the conll04_typed.py: from

 def _generate_examples(self, filepath):
        """This function returns the examples in the raw (text) triplet form."""
        logging.info("generating examples from = %s", filepath)
        print('filepath', filepath)
        # filepath = 
        with open(filepath) as json_file:

to

def _generate_examples(self, filepath):
        """This function returns the examples in the raw (text) triplet form."""
        logging.info("generating examples from = %s", filepath)
        print('filepath', filepath)
        # filepath = 
        with open(filepath[0]) as json_file:

filepath-->filepath[0] it works.