Open Jp-17 opened 5 months ago
hi, i meet the same problem, have you found a solution to solve this?
same issue when using local dataset c-eval
same issue for gsm8k
same issue for gsm8k
It seems to be solved by set dataset_name as null, directly using cache path.
dataset_path: your_cache_path (do not need save_to_disk()) dataset_name: null
i have the same problems with this issue ( https://github.com/EleutherAI/lm-evaluation-harness/issues/1347 )
i just want to eval gsm8k from local dataset folder, as the web in China can't access huggingfaces during using lm-eval.
i just follow the guide ( https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#beautifying-table-display ) to Using Local Datasets
I use the "dataset.save_to_disk()" to save gsm8k dataset into local folder, "llm/dataset/gsm8k". then i set gsm8k.yaml as " task: try_gsm8k dataset_path: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k dataset_name: main " or " task: try_gsm8k dataset_path: gsm8k dataset_kwargs: data_dir: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/ dataset_name: main " it doesn't work neither, and show the same bug info " File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 371, in init self.config, self.config_id = self._create_builder_config( File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 592, in _create_builder_config raise ValueError( ValueError: BuilderConfig 'main' not found. Available: ['default']"
However when i try to set the gsm8k.yaml as " task: gsm8k dataset_path: arrow # original gsm8k dataset_kwargs: data_files: train: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/train/data-00000-of-00001.arrow test: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/test/data-00000-of-00001.arrow dataset_name: main " it works, however it's not convenient as i also want to evaluate mmlu benchmark, which contain many tasks, it's not convenient to reset every subtask yaml with "data files in dataset_kwargs".
Want any help if possible