EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.9k stars 1.84k forks source link

eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found." #1829

Open Jp-17 opened 5 months ago

Jp-17 commented 5 months ago

i have the same problems with this issue ( https://github.com/EleutherAI/lm-evaluation-harness/issues/1347 )

i just want to eval gsm8k from local dataset folder, as the web in China can't access huggingfaces during using lm-eval.

i just follow the guide ( https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#beautifying-table-display ) to Using Local Datasets

I use the "dataset.save_to_disk()" to save gsm8k dataset into local folder, "llm/dataset/gsm8k". then i set gsm8k.yaml as " task: try_gsm8k dataset_path: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k dataset_name: main " or " task: try_gsm8k dataset_path: gsm8k dataset_kwargs: data_dir: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/ dataset_name: main " it doesn't work neither, and show the same bug info " File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 371, in init self.config, self.config_id = self._create_builder_config( File "/home/jiangp/.conda/envs/llm2/lib/python3.8/site-packages/datasets/builder.py", line 592, in _create_builder_config raise ValueError( ValueError: BuilderConfig 'main' not found. Available: ['default']"

However when i try to set the gsm8k.yaml as " task: gsm8k dataset_path: arrow # original gsm8k dataset_kwargs: data_files: train: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/train/data-00000-of-00001.arrow test: /mnt/nfs/vault/jiangp/llm/dataset/gsm8k/main/test/data-00000-of-00001.arrow dataset_name: main " it works, however it's not convenient as i also want to evaluate mmlu benchmark, which contain many tasks, it's not convenient to reset every subtask yaml with "data files in dataset_kwargs".

Want any help if possible

LitterBrother-Xiao commented 3 months ago

hi, i meet the same problem, have you found a solution to solve this?

ningmenghongcha commented 2 months ago

same issue when using local dataset c-eval

ruleGreen commented 1 month ago

same issue for gsm8k

ruleGreen commented 1 month ago

same issue for gsm8k

It seems to be solved by set dataset_name as null, directly using cache path.

dataset_path: your_cache_path (do not need save_to_disk()) dataset_name: null