bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
709 stars 183 forks source link

'HumanEval' object has no attribute 'dataset' #131

Closed dongguanting closed 4 months ago

dongguanting commented 10 months ago

When I evaluate human eval with llama 7b, I met this problem:

image

my script

accelerate launch /cpfs01/shared/Group-m6/dongguanting.dgt/bigcode-evaluation-harness/main.py \ --model "/path to my llama7b/llama-7b" \ --tasks humaneval \ --max_length_generation 512 \ --do_sample True \ --n_samples 200 \ --batch_size 100 \ --temperature 0.2 \ --precision bf16 \ --allow_code_execution \ --use_auth_token

loubnabnl commented 10 months ago

Hi, I can't reproduce the error, are you using main? maybe the dataset wasn't downloaded successfully could you try again, and make sure you don't have another package installed called HumanEval by any chance

mu-arkhipov commented 10 months ago

I have faced the same problem. The problem appears to be at https://github.com/bigcode-project/bigcode-evaluation-harness/blob/68aea1805283b123a4e745e696e041c47cca4993/lm_eval/base.py#L27 If the load_dataset function raise no error only a warning message is produced, which is hidden by the traceback in the @dongguanting screenshot. In my case the problem was with access rights to the dataset folder. The fact that me as well as you @loubnabnl and @dongguanting had not figured out what is wrong right away depicts that this part of the code should be refactored. The behavior of the Task class here is quite implicit, while Python Dzen tell us that "Explicit is better than implicit". The implication here is that the dataset attribute can be set outside of the base Task class. The only task which use this implication is MultiPLE https://github.com/bigcode-project/bigcode-evaluation-harness/blob/68aea1805283b123a4e745e696e041c47cca4993/lm_eval/tasks/multiple.py#L89 Note that the load_dataset function will be called twice here. Possible bug?

I believe the better way to allow to create the dataset outside of the Task is to make optional dataset argument in the __init__. The default value is None, if the dataset is passed then we do not execute self.dataset = load_dataset(...).

To illustrate, this is how code looks like now:

class Task(ABC):
    ...
    def __init__(self, stop_words=None, requires_execution=True):
        ...
        try:
            self.dataset = load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
        except Exception as e:
            warn(
                f"Loading the dataset failed with {str(e)}. This task will use a locally downloaded dataset, not from the HF hub."
            )

And here is what I propose:

class Task(ABC):
    ...
    def __init__(self, stop_words=None, requires_execution=True, dataset=None):
        ...
        if dataset is None:
            self.dataset = load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
        else:
            self.dataset = dataset

If this looks good - I am eager to double check everything and make a pool request (:

pearlmary commented 6 months ago

Error Traceback (most recent call last): File "main.py", line 396, in main() File "main.py", line 380, in main results[task] = evaluator.evaluate( File "/workspace/bigcode-evaluation-harness/bigcode_eval/evaluator.py", line 95, in evaluate generations, references = self.generate_text(task_name, intermediate_generations=intermediate_generations) File "/workspace/bigcode-evaluation-harness/bigcode_eval/evaluator.py", line 45, in generate_text dataset = task.get_dataset() File "/workspace/bigcode-evaluation-harness/bigcode_eval/tasks/humaneval.py", line 62, in get_dataset return self.dataset["test"] AttributeError: 'HumanEval' object has no attribute 'dataset' Traceback (most recent call last): File "/opt/conda/bin/accelerate", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 1017, in launch_command simple_launcher(args) File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.8', 'main.py', '--model', 'CodeLlama-7b-Instruct-hf', '--max_length_generation', '512', '--limit', '3', '--tasks', 'humaneval', '--temperature', '0.2', '--n_samples', '200', '--batch_size', '2', '--allow_code_execution']' returned non-zero exit status 1.

loubnabnl commented 5 months ago

@pearlmary it seems you're having trouble loading the humaneval dataset you should've had this warning above the error

"Loading the dataset failed with {str(e)}. This task will use a locally downloaded dataset, not from the HF hub."

We can't throw an error because some datasets are local, hence the warning (but I'll try to make it more informative). Can you try installing the requirements again? we also faced this issue at one point due to a bug in a recent version of fsspec which is now fixed in the requirements

@mu-arkhipov I'm not sure your solution addresses the issue, it's true that MultiPL-E calls the dataset again to get the stop words, but that shouldn't be an issue since it's a very small dataset and it will be cached for future calls.

AnitaLiu98 commented 4 months ago

I encountered the same issue as you, and I resolved it by executing 'huggingface-cli login'.