Closed dongguanting closed 4 months ago
Hi, I can't reproduce the error, are you using main
? maybe the dataset wasn't downloaded successfully could you try again, and make sure you don't have another package installed called HumanEval by any chance
I have faced the same problem. The problem appears to be at https://github.com/bigcode-project/bigcode-evaluation-harness/blob/68aea1805283b123a4e745e696e041c47cca4993/lm_eval/base.py#L27 If the load_dataset
function raise no error only a warning message is produced, which is hidden by the traceback in the @dongguanting screenshot. In my case the problem was with access rights to the dataset folder. The fact that me as well as you @loubnabnl and @dongguanting had not figured out what is wrong right away depicts that this part of the code should be refactored. The behavior of the Task
class here is quite implicit, while Python Dzen tell us that "Explicit is better than implicit". The implication here is that the dataset attribute can be set outside of the base Task
class. The only task which use this implication is MultiPLE https://github.com/bigcode-project/bigcode-evaluation-harness/blob/68aea1805283b123a4e745e696e041c47cca4993/lm_eval/tasks/multiple.py#L89
Note that the load_dataset
function will be called twice here. Possible bug?
I believe the better way to allow to create the dataset outside of the Task
is to make optional dataset
argument in the __init__
. The default value is None
, if the dataset is passed then we do not execute self.dataset = load_dataset(...)
.
To illustrate, this is how code looks like now:
class Task(ABC):
...
def __init__(self, stop_words=None, requires_execution=True):
...
try:
self.dataset = load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
except Exception as e:
warn(
f"Loading the dataset failed with {str(e)}. This task will use a locally downloaded dataset, not from the HF hub."
)
And here is what I propose:
class Task(ABC):
...
def __init__(self, stop_words=None, requires_execution=True, dataset=None):
...
if dataset is None:
self.dataset = load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
else:
self.dataset = dataset
If this looks good - I am eager to double check everything and make a pool request (:
Error
Traceback (most recent call last):
File "main.py", line 396, in
@pearlmary it seems you're having trouble loading the humaneval dataset you should've had this warning above the error
"Loading the dataset failed with {str(e)}. This task will use a locally downloaded dataset, not from the HF hub."
We can't throw an error because some datasets are local, hence the warning (but I'll try to make it more informative). Can you try installing the requirements again? we also faced this issue at one point due to a bug in a recent version of fsspec which is now fixed in the requirements
@mu-arkhipov I'm not sure your solution addresses the issue, it's true that MultiPL-E calls the dataset again to get the stop words, but that shouldn't be an issue since it's a very small dataset and it will be cached for future calls.
I encountered the same issue as you, and I resolved it by executing 'huggingface-cli login'.
When I evaluate human eval with llama 7b, I met this problem:
my script
accelerate launch /cpfs01/shared/Group-m6/dongguanting.dgt/bigcode-evaluation-harness/main.py \ --model "/path to my llama7b/llama-7b" \ --tasks humaneval \ --max_length_generation 512 \ --do_sample True \ --n_samples 200 \ --batch_size 100 \ --temperature 0.2 \ --precision bf16 \ --allow_code_execution \ --use_auth_token