EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.96k stars 1.86k forks source link

Error when running lm_eval with piqa task with EleutherAI/gpt-j-6b #1744

Closed JingyangXiang closed 6 months ago

JingyangXiang commented 6 months ago

Descrption

An error occurs when running lm_eval with the piqa task using EleutherAI/gpt-j-6b as follows:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Steps to Reproduce:

  1. Start evaluation
lm_eval --model hf \
    --model_args pretrained=EleutherAI/gpt-j-6b,parallelize=True,load_in_4bit=True,peft=nomic-ai/gpt4all-j-lora \
    --tasks piqa \
    --device cuda:0
zjuruizhechen commented 6 months ago

hii sorry for the bothering. how to solve this?

Thanksss.

JingyangXiang commented 6 months ago

lm_eval can not preprocess piaq automatically, you can visit https://huggingface.co/datasets/piqa and download piqa.py, preprocess the piqa into .cache dir via datasets package before you run lm_eval.

ss7424Refar commented 4 months ago

"I encountered the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte while using hellaswag. My solution is the following:"

image