EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.35k stars 1.68k forks source link

ValueError when task name collide with local directory names #2122

Open alat-rights opened 1 month ago

alat-rights commented 1 month ago

Steps to reproduce

$ cd lm-eval-harness
$ pip install -e .[vllm]
$ mkdir hellaswag
$ lm-eval --tasks hellaswag --model vllm --model_args pretrained=deepseek-ai/deepseek-coder-1.3b-instruct,dtype=float16,tensor_parallel_size=1 --limit 5

Observed error:

ubuntu:~/evalharness$ lm-eval --tasks hellaswag --model vllm --model_args pretrained=deepseek-ai/deepseek-coder-1.3b-instruct,dtype=float16,tensor_parallel_size=1 --limit 5
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.2
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2024-07-20 13:09:49.266468: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-20 13:09:49.309275: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX512F AVX512_VNNI, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/lm-eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/ubuntu/evalharness/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/ubuntu/evalharness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/evalharness/lm_eval/evaluator.py", line 163, in simple_evaluate
    raise ValueError(
ValueError: No tasks specified, or no tasks found. Please verify the task names.

Why this (maybe) matters

The way I came across this bug was when I tried to run log mmlu samples to a folder called mmlu. This seems like a normal thing to try to do, and the error could be confusing.

Mitigations

I think it would be nice if there was just a clearer error message for when this happens.

Code points

haileyschoelkopf commented 1 month ago

Hi! Thanks for opening this, definitely high-priority to fix. Thought I'd handled some other edge cases relating to tasks shadowing other tasks but evidently not this one.--am away at a conference but can try to push a fix ASAP.