Open zhabuye opened 2 months ago
group: basque-glue
task: bhtc_v2, bec, vaxx_stance, qnlieu, wiceu, epec_korref_bin
issue:
The tasks bec
and epec_korref_bin
both have the issue of Tasks not found
, after reviewing the YAML file, I found bec2016eu
and epec_koref_bin
, I would like to ask if there is a mistake in the README description?
The above group and tasks have both encountered the issue:
File "/lm-evaluation-harness/lm_eval/evaluator_utils.py", line 80, in from_taskdict
n_shot = task_config.get("metadata", {}).get("num_fewshot", 0)
AttributeError: 'list' object has no attribute 'get'
done #1913
group: belebele
task: belebele_kac_Latn
issue:
Single GPU evaluation passed, but the multi-GPU evaluation encountered an error with the message requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://huggingface.co/api/datasets/facebook/belebele/paths-info/75b399394a9803252cfec289d103de462763db7c
.
Similarly, that can be normally downloaded and evaluated on a single GPU, but encounter errors during multi-GPU evaluation, most of which are network-related or ValueError. The list is as follows:
codexglue_code2text
french_bench
gpt3_translation_benchmarks
triviaqa
xnli
translation
generate_until
openllm
group: pile_10k, gpqa issue: During Multi-GPU evaluation, these two tasks only have one to two out of the eight cards with high utilization, which results in an out of memory error even when the batch size for pile_10k is set to 1.
group: scrolls
task: scrolls_contractnli, scrolls_govreport, scrolls_narrativeqa, scrolls_qasper, scrolls_qmsum, scrolls_quality, scrolls_summscreenfd
issue:
Both Single GPU evaluation and multi-GPU evaluation are reporting the error ValueError: A random.Random generator argument must be provided to rnd.
Additionally, when performing evaluation with scrolls
, it shows Tasks not found
, meaning that the group name cannot be used to package and evaluate the subtasks.
task: t0_eval issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 233, in _load_individual_task_or_group
group_name = name_or_config["group"]
KeyError: 'group'
task: flan_held_out issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/lm-evaluation-harness/lm_eval/filters/__init__.py", line 21, in build_filter_ensemble
f = partial(get_filter(function), **kwargs)
TypeError: the first argument must be callable
task: csatqa issue:
datasets.exceptions.DatasetNotFoundError: Dataset 'EleutherAI/csatqa' doesn't exist on the Hub or cannot be accessed. If the dataset is private or gated, make sure to log in with `huggingface-cli login` or visit the dataset page at https://huggingface.co/datasets/EleutherAI/csatqa to ask for access.
It seems that there has been a change in the dataset name on Hugging Face. I hope you can check it.
task: bigbench_generate_until issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/data2/miniconda3/envs/ZYB/lib/python3.8/site-packages/datasets/arrow_reader.py", line 255, in read
raise ValueError(msg)
ValueError: Instruction "train" corresponds to no data!
task: bigbench_multiple_choice issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/data2/miniconda3/envs/ZYB/lib/python3.8/site-packages/jinja2/environment.py", line 1301, in render
self.environment.handle_exception()
File "/data2/miniconda3/envs/ZYB/lib/python3.8/site-packages/jinja2/environment.py", line 936, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 1, in top-level template code
ValueError: "English: You don't want to push the button lightly, but rather punch it hard." is not in list
task: ifeval issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/home/ZYB/project/lm-evaluation-harness/lm_eval/tasks/ifeval/utils.py", line 9, in <module>
class InputExample:
File "/home/ZYB/project/lm-evaluation-harness/lm_eval/tasks/ifeval/utils.py", line 11, in InputExample
instruction_id_list: list[str]
TypeError: 'type' object is not subscriptable
pass in python3.9
task: pile issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 374, in __init__
self.config, self.config_id = self._create_builder_config(
File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 599, in _create_builder_config
raise ValueError(
ValueError: BuilderConfig 'pile_openwebtext2' not found. Available: ['all', 'enron_emails', 'europarl', 'free_law', 'hacker_news', 'nih_exporter', 'pubmed', 'pubmed_central', 'ubuntu_irc', 'uspto', 'github']
task: storycloze issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 374, in __init__
self.config, self.config_id = self._create_builder_config(
File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 612, in _create_builder_config
builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs)
File "<string>", line 8, in __init__
File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 131, in __post_init__
if invalid_char in self.name:
TypeError: argument of type 'int' is not iterable
task: wmt-t5-prompt issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
Traceback (most recent call last):
File "/home/ZYB/project/lm-evaluation-harness/lm_eval/api/task.py", line 1369, in process_results
Traceback (most recent call last):
File "/home/ZYB/project/lm-evaluation-harness/lm_eval/api/task.py", line 1369, in process_results
Traceback (most recent call last):
File "/home/ZYB/project/lm-evaluation-harness/lm_eval/api/task.py", line 1369, in process_results
Traceback (most recent call last):
File "/home/ZYB/project/lm-evaluation-harness/lm_eval/api/task.py", line 1369, in process_results
Traceback (most recent call last):
File "/home/ZYB/project/lm-evaluation-harness/lm_eval/api/task.py", line 1369, in process_results
result_score = self._metric_fn_list[metric](
TypeError: 'NoneType' object is not callable
I have the same problem on storycloze
Hi @zhabuye , thanks very much for going through these!
I'll try to respond or address these one by one ASAP.
group: basque-glue task: bhtc_v2, bec, vaxx_stance, qnlieu, wiceu, epec_korref_bin issue: The tasks
bec
andepec_korref_bin
both have the issue ofTasks not found
, after reviewing the YAML file, I foundbec2016eu
andepec_koref_bin
, I would like to ask if there is a mistake in the README description? The above group and tasks have both encountered the issue:File "/lm-evaluation-harness/lm_eval/evaluator_utils.py", line 80, in from_taskdict n_shot = task_config.get("metadata", {}).get("num_fewshot", 0) AttributeError: 'list' object has no attribute 'get'
A minor change in the YAML files resolves the AttributeError for me. Remove the hyphen (-) in front of the version
.
metadata:
version: 1.0
group: basque-glue task: bhtc_v2, bec, vaxx_stance, qnlieu, wiceu, epec_korref_bin issue: The tasks
bec
andepec_korref_bin
both have the issue ofTasks not found
, after reviewing the YAML file, I foundbec2016eu
andepec_koref_bin
, I would like to ask if there is a mistake in the README description? The above group and tasks have both encountered the issue:File "/lm-evaluation-harness/lm_eval/evaluator_utils.py", line 80, in from_taskdict n_shot = task_config.get("metadata", {}).get("num_fewshot", 0) AttributeError: 'list' object has no attribute 'get'
A minor change in the YAML files resolves the AttributeError for me. Remove the hyphen (-) in front of the
version
.metadata: version: 1.0
Thank you very much for the proposal you made; it really solved the problem. In response, I will submit a Pull request.
When using the --tasks bbh
option, it appears that the results only included the bbh_cot_fewshot group
. Were the other three groups omitted?
importing List from the typing module can avoid this error
May be less code changes with addition of
from __future__ import annotations
in import sections?
https://github.com/asottile/pyupgrade — maybe this can help to check for missed compatibility fixes
Having the exact same issue ...
task: storycloze issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 374, in __init__ self.config, self.config_id = self._create_builder_config( File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 612, in _create_builder_config builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs) File "<string>", line 8, in __init__ File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 131, in __post_init__ if invalid_char in self.name: TypeError: argument of type 'int' is not iterable
Having the exact same issue ...
task: storycloze issue: Both Single GPU evaluation and multi-GPU evaluation are reporting the error:
File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 374, in __init__ self.config, self.config_id = self._create_builder_config( File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 612, in _create_builder_config builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs) File "<string>", line 8, in __init__ File "/home/anaconda3/envs/ZYB/lib/python3.8/site-packages/datasets/builder.py", line 131, in __post_init__ if invalid_char in self.name: TypeError: argument of type 'int' is not iterable
same problem for me.. didn't work
I am currently verifying all tasks under the
lm-evaluation-harness
. I will raise the issues I encounter one after another in this issue thread. Thank you for your inspection and response! @haileyschoelkopf @lintangsutawika Below is the command template I executed: