EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
7.05k stars 1.9k forks source link

How can I run the mmlu task in offline mode? #2394

Open 95jinchul opened 1 month ago

95jinchul commented 1 month ago

In the #1223, there are solution for offline mode. So, I try run mmlu task using below yaml setting.

However, in the case of mmlu, it is difficult to transfer data_files to data_kwargs because it is mapped to group configuration.

Usually, datasets are imported in the following way.

for name in ['all', 'abstract_algebra', 'anatomy', 'astronomy', 'business_ethics', 'clinical_knowledge', 'college_biology', 'college_chemistry', 'college_computer_science', 'college_mathematics', 'college_medicine', 'college_physics ', 'computer_security', 'conceptual_physics', 'econometrics', ... ]: dataset = load_dataset("hails/mmlu_no_train", f'{name}') dataset.save_to_disk(f"dataset/mmlu/{name}")

So, is there any way to load this save_to_disk file into load_dataset? I would like to import it from harness as is without going through hf_hub, but errors always occur and difficulties arise.

DtYXs commented 1 week ago

+1

jgcb00 commented 6 days ago

Indeed when having :

export HF_HUB_OFFLINE=1

MMLU gives :

huggingface_hub.errors.OfflineModeIsEnabled: Cannot reach https://huggingface.co/api/datasets/cais/mmlu/revision/main: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable.

doesn't append with other tasks

But the all point is to make it run without internet