OpenGPTX / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
8 stars 8 forks source link

pawsx_de, pawsx_en, x_stance, gnad10 error message for metrics #76

Open katrinklug opened 1 year ago

katrinklug commented 1 year ago

@KlaudiaTH when running the tasks pawsx_de, pawsx_en, x_stance, gnad10 I get the following error message (example error message for pawsx):

File "./tasks/eval_harness/evaluate.py", line 446, in <module> main() File "./tasks/eval_harness/evaluate.py", line 429, in main results = evaluator.evaluate(adaptor, {task_name: task}, False, 0, None, bootstrap_iters=args.bootstrap_iters) File "/lm-evaluation-harness/lm_eval/utils.py", line 162, in _wrapper return fn(*args, **kwargs) File "/lm-evaluation-harness/lm_eval/evaluator.py", line 288, in evaluate results[task_name][metric] = task.aggregation()[real_metric](items) File "/lm-evaluation-harness/lm_eval/tasks/pawsx.py", line 55, in _pawsx_agg_precision precision_metric = datasets.load_metric("precision") File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1389, in load_metric metric_module = metric_module_factory( File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1331, in metric_module_factory raise e1 from None File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1319, in metric_module_factory return GithubMetricModuleFactory( File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 544, in get_module local_path = self.download_loading_script(revision) File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 538, in download_loading_script return cached_path(file_path, download_config=download_config) File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 185, in cached_path Traceback (most recent call last): File "./tasks/eval_harness/evaluate.py", line 446, in <module> main() File "./tasks/eval_harness/evaluate.py", line 429, in main results = evaluator.evaluate(adaptor, {task_name: task}, False, 0, None, bootstrap_iters=args.bootstrap_iters) File "/lm-evaluation-harness/lm_eval/utils.py", line 162, in _wrapper return fn(*args, **kwargs) File "/lm-evaluation-harness/lm_eval/evaluator.py", line 288, in evaluate results[task_name][metric] = task.aggregation()[real_metric](items) File "/lm-evaluation-harness/lm_eval/tasks/pawsx.py", line 55, in _pawsx_agg_precision precision_metric = datasets.load_metric("precision") File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1389, in load_metric metric_module = metric_module_factory( File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1331, in metric_module_factory raise e1 from None File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1319, in metric_module_factory return GithubMetricModuleFactory( File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 544, in get_module local_path = self.download_loading_script(revision) File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 538, in download_loading_script return cached_path(file_path, download_config=download_config) File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 185, in cached_path Traceback (most recent call last): File "./tasks/eval_harness/evaluate.py", line 446, in <module> main() File "./tasks/eval_harness/evaluate.py", line 429, in main results = evaluator.evaluate(adaptor, {task_name: task}, False, 0, None, bootstrap_iters=args.bootstrap_iters) File "/lm-evaluation-harness/lm_eval/utils.py", line 162, in _wrapper return fn(*args, **kwargs) File "/lm-evaluation-harness/lm_eval/evaluator.py", line 288, in evaluate results[task_name][metric] = task.aggregation()[real_metric](items) File "/lm-evaluation-harness/lm_eval/tasks/pawsx.py", line 55, in _pawsx_agg_precision precision_metric = datasets.load_metric("precision") File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1389, in load_metric metric_module = metric_module_factory( File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1331, in metric_module_factory raise e1 from None File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 1319, in metric_module_factory return GithubMetricModuleFactory( File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 544, in get_module local_path = self.download_loading_script(revision) File "/opt/conda/lib/python3.8/site-packages/datasets/load.py", line 538, in download_loading_script return cached_path(file_path, download_config=download_config) File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 185, in cached_path output_path = get_from_cache( File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 531, in get_from_cache _raise_if_offline_mode_is_enabled(f"Tried to reach {url}") File "/opt/conda/lib/python3.8/site-packages/datasets/utils/file_utils.py", line 261, in _raise_if_offline_mode_is_enabled raise OfflineModeIsEnabled( datasets.utils.file_utils.OfflineModeIsEnabled: Offline mode is enabled. Tried to reach https://raw.githubusercontent.com/huggingface/datasets/2.4.0/metrics/precision/precision.py

KlaudiaTH commented 1 year ago

New images ... Taurus: /projects/p025/p_gptx/apptainer_images/obmd-lmeval-21.12_100423-py3.sif Juwels: /p/scratch/opengptx-elm/shared/apptainer_images/obmd-lmeval-21.12_100423-py3.sif