EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.34k stars 1.68k forks source link

Task Configuration didn't work #1317

Closed WanliYoung closed 6 months ago

WanliYoung commented 7 months ago

Hi, thanks for your great contributions! When I run the command: lm_eval --tasks xxx(sst2,hellaswag,mmlu) --model hf --model_args pretrained=/local/path/to/model --device cuda:1 --batch_size 20 I found the task to evaluate is anli, I guess maybe the task configuration didn’t work. There might be something wrong with my operation. Are there any suggestions?

WanliYoung commented 7 months ago

Actually, I add some codes here: https://github.com/EleutherAI/lm-evaluation-harness/blob/b93c3bcbf30109c40cd19ee862161702920b9c27/lm_eval/api/task.py#L716 to print the required dataset

But with the command: lm_eval --tasks xxx(sst2,hellaswag,mmlu) --model hf --model_args pretrained=/local/path/to/model --device cuda:1 --batch_size 20 the download function is to download anli task, I don't know why this happened.

lintangsutawika commented 7 months ago

Hi,

I'm not sure what you mean by --task xxx(sst2,hellaswag,mmlu)

Could talk more about what you wanted to try?

WanliYoung commented 7 months ago

Thanks to your response. Because I need to use your code in a machine that can't access huggingface, so I want to download the task datasets needed. I add some print codes in https://github.com/EleutherAI/lm-evaluation-harness/blob/b93c3bcbf30109c40cd19ee862161702920b9c27/lm_eval/api/task.py#L716 to print self.DATASET_PATH and self.DATASET_NAME. But when I run the command: lm_eval --tasks sst2 --model hf --model_args pretrained=/local/path/to/model --device cuda:1 --batch_size 20 the printed information is: self.DATASET_NAME=None and self.DATASET_PATH=anli I don't know why this happened.

lintangsutawika commented 7 months ago

Not sure why that is the case. Looks like a bug i should investigate.

But if you need to find the dataset_name and path you could always do a search in lm_eval/tasks/

sst2 is in glue/ hellaswag is hellaswag/ mmlu is in mmlu/default/

WanliYoung commented 7 months ago

Actually, it seems to dowanload all the task datasets, I don't know why this happended. I just want to download the dataset for the task I want to evaluate.

2024-01-19:10:17:37,859 INFO [utils.py:147] Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2024-01-19:10:17:37,859 INFO [utils.py:159] NumExpr defaulting to 8 threads. 2024-01-19:10:17:38,714 INFO [config.py:58] PyTorch version 2.0.1 available. Namespace(model='hf', tasks='sst2', model_args='pretrained=/home/yangwanli/data/llama-7b', num_fewshot=None, batch_size='2', max_batch_size=None, device='cuda:1', output_path=None, limit=None, use_cache=None, decontamination_ngrams_path=None, check_integrity=False, write_out=False, log_samples=False, show_config=False, include_path=None, gen_kwargs=None, verbosity='INFO') 2024-01-19:10:17:45,621 INFO [main.py:156] Verbosity set to INFO None anli 2024-01-19:10:18:38,294 WARNING [init.py:185] Unexpected error loading config in /home/yangwanli/lm-evaluation-harness/lm_eval/tasks/benchmarks/t0_eval.yaml Config will not be added to registry Error: to_dict() got an unexpected keyword argument 'keep_callable' Traceback: Traceback (most recent call last): File "/home/yangwanli/lm-evaluation-harness/lm_eval/tasks/init.py", line 173, in include_task_folder register_configurable_group(config, yaml_path) File "/home/yangwanli/lm-evaluation-harness/lm_eval/tasks/init.py", line 75, in register_configurable_group base_config = task_obj._config.to_dict(keep_callable=True) TypeError: to_dict() got an unexpected keyword argument 'keep_callable'

None anli 2024-01-19:10:19:33,526 WARNING [init.py:185] Unexpected error loading config in /home/yangwanli/lm-evaluation-harness/lm_eval/tasks/benchmarks/flan/flan_anli.yaml Config will not be added to registry Error: to_dict() got an unexpected keyword argument 'keep_callable' Traceback: Traceback (most recent call last): File "/home/yangwanli/lm-evaluation-harness/lm_eval/tasks/init.py", line 173, in include_task_folder register_configurable_group(config, yaml_path) File "/home/yangwanli/lm-evaluation-harness/lm_eval/tasks/init.py", line 75, in register_configurable_group base_config = task_obj._config.to_dict(keep_callable=True) TypeError: to_dict() got an unexpected keyword argument 'keep_callable'

ARC-Easy ai2_arc Downloading readme: 100%|███████████████████████████████████████████████████████████████████████████████| 9.00k/9.00k [00:00<00:00, 4.11MB/s] Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████| 331k/331k [00:02<00:00, 160kB/s] Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████| 346k/346k [00:01<00:00, 341kB/s] Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████| 86.1k/86.1k [00:00<00:00, 108kB/s] Generating train split: 100%|██████████████████████████████████████████████████████████████████| 2251/2251 [00:00<00:00, 71569.51 examples/s] Generating test split: 100%|██████████████████████████████████████████████████████████████████| 2376/2376 [00:00<00:00, 128961.98 examples/s] Generating validation split: 100%|██████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 119885.33 examples/s] 2024-01-19:10:20:01,050 WARNING [init.py:185] Unexpected error loading config in /home/yangwanli/lm-evaluation-harness/lm_eval/tasks/benchmarks/flan/flan_arc.yaml Config will not be added to registry Error: to_dict() got an unexpected keyword argument 'keep_callable' Traceback: Traceback (most recent call last): File "/home/yangwanli/lm-evaluation-harness/lm_eval/tasks/init.py", line 173, in include_task_folder register_configurable_group(config, yaml_path) File "/home/yangwanli/lm-evaluation-harness/lm_eval/tasks/init.py", line 75, in register_configurable_group base_config = task_obj._config.to_dict(keep_callable=True) TypeError: to_dict() got an unexpected keyword argument 'keep_callable'

2024-01-19:10:20:01,065 WARNING [templates.py:384] Tried instantiating DatasetTemplates for gsmk boolq, but no prompts found. Please ignore this warning if you are creating new prompts for this dataset. 2024-01-19:10:20:01,068 WARNING [templates.py:384] Tried instantiating DatasetTemplates for EleutherAI/asdiv, but no prompts found. Please ignore this warning if you are creating new prompts for this dataset. anatomy hails/mmlu_no_train /home/yangwanli/anaconda3/lib/python3.9/site-packages/datasets/load.py:1429: FutureWarning: The repository for hails/mmlu_no_train contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/hails/mmlu_no_train You can avoid this message in future by passing the argument trust_remote_code=True. Passing trust_remote_code=True will be mandatory to load this dataset from the next major release of datasets. warnings.warn( Downloading builder script: 100%|███████████████████████████████████████████████████████████████████████| 5.86k/5.86k [00:00<00:00, 4.28MB/s] Downloading readme: 100%|████████████████████████████████████████████████████████████████████████████████████| 420/420 [00:00<00:00, 196kB/s] Downloading data: 42%|███████████████████████████████████▏ | 70.5M/166M [01:14<01:38, 973kB/s]Downloading data: 42%|███████████████████████████████████▏

haileyschoelkopf commented 7 months ago

Can you provide the actual CLI command you are running?

It looks like you are running the t0_eval grouping of tasks, which does include ANLI as a subtask.

It does look as if there is a separate bug from #1315 , cc @lintangsutawika

lintangsutawika commented 7 months ago

I think this may have something to do with promptsource-based configs. I think we had a note in TaskConfig about this in the to_dict method.

haileyschoelkopf commented 6 months ago

Checking in on this--I believe this should be fully fixed. We had a bug where a couple extra datasets were getting downloaded upon the first time using the library, but this no longer occurs, and keep_callable as a config-loading kwarg has now been around for a little while.