Open 95jinchul opened 1 month ago
+1
Indeed when having :
export HF_HUB_OFFLINE=1
MMLU gives :
huggingface_hub.errors.OfflineModeIsEnabled: Cannot reach https://huggingface.co/api/datasets/cais/mmlu/revision/main: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable.
doesn't append with other tasks
But the all point is to make it run without internet
In the #1223, there are solution for offline mode. So, I try run mmlu task using below yaml setting.
However, in the case of mmlu, it is difficult to transfer data_files to data_kwargs because it is mapped to group configuration.
Usually, datasets are imported in the following way.
for name in ['all', 'abstract_algebra', 'anatomy', 'astronomy', 'business_ethics', 'clinical_knowledge', 'college_biology', 'college_chemistry', 'college_computer_science', 'college_mathematics', 'college_medicine', 'college_physics ', 'computer_security', 'conceptual_physics', 'econometrics', ... ]: dataset = load_dataset("hails/mmlu_no_train", f'{name}') dataset.save_to_disk(f"dataset/mmlu/{name}")
So, is there any way to load this save_to_disk file into load_dataset? I would like to import it from harness as is without going through hf_hub, but errors always occur and difficulties arise.