Across a few models and a few BBH tasks, I obtain this error:
match = [m for m in match if m][0]
IndexError: list index out of range
The full stack trace is below:
$ lm_eval --model hf --model_args pretrained=01-ai/Yi-6B,trust_remote_code=True --tasks bbh_cot_zeroshot_web_of_lies --batch_size auto:4 --device cuda:1 --output_path eval_results/Yi_6B/bbh_cot_zeroshot_web_of_lies --log_samples
2024-04-08:13:58:41,492 INFO [__main__.py:225] Verbosity set to INFO
2024-04-08:13:58:41,492 INFO [__init__.py:373] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-04-08:13:58:44,990 INFO [__main__.py:311] Selected Tasks: ['bbh_cot_zeroshot_web_of_lies']
2024-04-08:13:58:44,990 INFO [__main__.py:312] Loading selected tasks...
2024-04-08:13:58:44,992 INFO [evaluator.py:129] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-04-08:13:58:46,670 WARNING [logging.py:61] Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
2024-04-08:13:58:46,670 INFO [huggingface.py:162] Using device 'cuda:1'
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.09it/s]
2024-04-08:13:58:49,152 INFO [evaluator.py:190] get_task_dict has been updated to accept an optional argument, `task_manager`Read more here:https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage
2024-04-08:13:58:50,931 WARNING [task.py:322] [Task: bbh_cot_zeroshot_web_of_lies] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2024-04-08:13:58:50,931 WARNING [task.py:322] [Task: bbh_cot_zeroshot_web_of_lies] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
2024-04-08:13:58:50,937 INFO [task.py:395] Building contexts for bbh_cot_zeroshot_web_of_lies on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [00:00<00:00, 2151.14it/s]
2024-04-08:13:58:51,057 INFO [evaluator.py:357] Running generate_until requests
Running generate_until requests: 0%| | 0/250 [00:00<?, ?it/s]Passed argument batch_size = auto. Detecting largest batch size
Determined Largest batch size: 16
Running generate_until requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [01:18<00:00, 3.19it/s]
Traceback (most recent call last):
File "/lfs/skampere1/0/rschaef/miniconda3/envs/pred_llm_evals_env/bin/lm_eval", line 8, in <module>
sys.exit(cli_evaluate())
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/__main__.py", line 318, in cli_evaluate
results = evaluator.simple_evaluate(
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/evaluator.py", line 230, in simple_evaluate
results = evaluate(
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/evaluator.py", line 383, in evaluate
task.apply_filters()
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/api/task.py", line 971, in apply_filters
f.apply(self._instances)
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/api/filter.py", line 51, in apply
resps = f().apply(resps, docs)
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/filters/extraction.py", line 46, in apply
filtered_resps = list(map(lambda x: filter_set(x), resps))
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/filters/extraction.py", line 46, in <lambda>
filtered_resps = list(map(lambda x: filter_set(x), resps))
File "/lfs/skampere1/0/rschaef/KoyejoLab-Predictable-LLM-Evals/submodules/lm-evaluation-harness/lm_eval/filters/extraction.py", line 38, in filter_set
match = [m for m in match if m][0]
IndexError: list index out of range
Across a few models and a few BBH tasks, I obtain this error:
The full stack trace is below: