EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
7.01k stars 1.88k forks source link

HellaSwag with UnicodeDecodeError #1757

Open Hua-rookie opened 6 months ago

Hua-rookie commented 6 months ago

When I was trying to evaluate HellaSwag using: lm_eval --model hf --model_args pretrained=HuggingFaceH4/zephyr-7b-beta,dtype="bfloat16" --tasks hellaswag --device cuda:0 --num_fewshot 10 --batch_size auto --trust_remote_code I met the error: File "/root/miniconda3/envs/lm_eval/lib/python3.10/site-packages/datasets/load.py", line 2587, in load_dataset builder_instance = load_dataset_builder( File "/root/miniconda3/envs/lm_eval/lib/python3.10/site-packages/datasets/load.py", line 2259, in load_dataset_builder dataset_module = dataset_module_factory( File "/root/miniconda3/envs/lm_eval/lib/python3.10/site-packages/datasets/load.py", line 1910, in dataset_module_factory raise e1 from None File "/root/miniconda3/envs/lm_eval/lib/python3.10/site-packages/datasets/load.py", line 1862, in dataset_module_factory can_load_config_from_parquet_export = "DEFAULT_CONFIG_NAME" not in f.read() File "/root/miniconda3/envs/lm_eval/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte

How can I solve this error?

zjuruizhechen commented 6 months ago

same problem

PotatoBearP commented 6 months ago

encounter the same issue in local environment

Shuizhimei commented 6 months ago

same issue

huangwei021230 commented 6 months ago

same issue

cs32963 commented 6 months ago

same issue update: it is working now

haileyschoelkopf commented 6 months ago

Cannot initially seem to replicate on a fresh HF cache... perhaps did something wrong though? Is the connection to the HF Hub working for those facing this problem?

Hua-rookie commented 6 months ago

Cannot initially seem to replicate on a fresh HF cache... perhaps did something wrong though? Is the connection to the HF Hub working for those facing this problem?

It seems not this problem, the connection is well on my machine.

savannahfan commented 6 months ago

same problem update: it is working now

Hua-rookie commented 6 months ago

same problem update: it is working now

So what changes did you make?

rangehow commented 5 months ago

same with drop

sci-m-wang commented 5 months ago

Waiting for solution...

chen1yunan commented 4 months ago

same problem, someone solved it?

chen1yunan commented 4 months ago

download dataset from hf to local,then modify yaml files will success

eggry commented 1 month ago

A workaround is to downgrade datasets to 2.14.6.

For more details, please refer to this issue: https://github.com/huggingface/datasets/issues/6760#issuecomment-2041390144 .