EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.44k stars 1.71k forks source link

`NotADirectoryError` on dataset `headqa_en` #1428

Open RylanSchaeffer opened 7 months ago

RylanSchaeffer commented 7 months ago

Hi! I'm trying to evaluate several models on as many tasks as possible. On one dataset (headqa_en), I received a NotADirectoryError while trying to load the dataset.

Command:

lm_eval --model hf \
--model_args pretrained=EleutherAI/pythia-410m,revision=step30000 \
--tasks headqa_en \
--device cuda:2 \
--batch_size auto:4 \
--output_path <my output path> \
--log_samples

Stack Trace:

2024-02-13:08:57:04,245 INFO     [__main__.py:162] Verbosity set to INFO
2024-02-13:08:57:04,245 INFO     [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-13:08:57:07,043 INFO     [__main__.py:238] Selected Tasks: ['headqa_en']
2024-02-13:08:57:07,043 INFO     [__main__.py:239] Loading selected tasks...
2024-02-13:08:57:07,717 WARNING  [logging.py:61] Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
2024-02-13:08:57:07,717 INFO     [huggingface.py:155] Using device 'cuda:2'
/lfs/skampere1/0/rschaef/miniconda3/envs/pred_llm_evals_env/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-02-13:08:57:09,790 INFO     [evaluator.py:134] get_task_dict has been updated to accept an optional argument, `task_manager`Read more here:https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage
/lfs/skampere1/0/rschaef/miniconda3/envs/pred_llm_evals_env/lib/python3.10/site-packages/datasets/load.py:1454: FutureWarning: The repository for EleutherAI/headqa contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/EleutherAI/headqa
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
Generating train split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/pred_llm_evals_env/lib/python3.10/site-packages/datasets/builder.py", line 1726, in _prepare_split_single
    for key, record in generator:
  File "/lfs/skampere1/0/rschaef/.cache/huggingface/modules/datasets_modules/datasets/EleutherAI--headqa/00117bfe3562be1fdb3c897dbf75101717cf239741a954423d1e344335a96089/headqa.py", line 138, in _generate_examples
    with open(filepath, encoding="utf-8") as f:
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/pred_llm_evals_env/lib/python3.10/site-packages/datasets/streaming.py", line 75, in wrapper
    return function(*args, download_config=download_config, **kwargs)
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/pred_llm_evals_env/lib/python3.10/site-packages/datasets/download/streaming_download_manager.py", line 507, in xopen
    return open(main_hop, mode, *args, **kwargs)
NotADirectoryError: [Errno 20] Not a directory: '/lfs/skampere1/0/rschaef/data/huggingface/downloads/9d10351eefe83ab9887de1b307f40404b99de9ba10fed427d64faa36ae611778/HEAD_EN/train_HEAD_EN.json'
haileyschoelkopf commented 7 months ago

I will check this out and report back!

I had just been told that HeadQA data was no longer available, but just looked now and I was able to download the data directly from the link: https://huggingface.co/datasets/EleutherAI/headqa/blob/1733f0af3205b1be2d25cd7a9a8ec891b26aa92d/headqa.py#L59