EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.74k stars 1.79k forks source link

Pile dataset not found #1338

Closed RenatoGeh closed 8 months ago

RenatoGeh commented 9 months ago

Hi,

I'm trying to run the pile group through lm_eval.simple_evaluate, but I am getting the following error.

Traceback (most recent call last):
  File "/scratch/renatolg/tokens/harness.py", line 72, in <module>
    res = lm_eval.simple_evaluate(**eval_args, bootstrap_iters=0)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/evaluator.py", line 122, in simple_evaluate
    task_dict = lm_eval.tasks.get_task_dict(tasks)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 255, in get_task_dict
    task_obj = get_task_dict(task_name)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 275, in get_task_dict
    task_name: get_task(task_name=task_element, config=config),
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/tasks/__init__.py", line 217, in get_task
    return TASK_REGISTRY[task_name](config=config)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/api/task.py", line 622, in __init__
    self.download(self.config.dataset_kwargs)
  File "/home/renatolg/.local/lib/python3.10/site-packages/lm_eval/api/task.py", line 717, in download
    self.dataset = datasets.load_dataset(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/load.py", line 1852, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/builder.py", line 373, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/renatolg/.local/lib/python3.10/site-packages/datasets/builder.py", line 539, in _create_builder_config
    raise ValueError(
ValueError: BuilderConfig 'pile_arxiv' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']

I thought pile (and its subsets, like pile_arxiv) was included in lm-evaluation-harness?

Just to clarify, I did successfully initialize tasks with lm_eval.tasks.initialize_tasks().

Thanks

haileyschoelkopf commented 9 months ago

Hi!

to run the Pile tasks, you'll need to use the fix described in https://github.com/EleutherAI/lm-evaluation-harness/issues/731 and have access to the Pile locally, since it is no longer downloadable via the Eye.