bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825 stars 219 forks source link

Error of testing codegeex4-all-9b model #264

Open Gumingbro opened 3 months ago

Gumingbro commented 3 months ago

Hello,guys. I came across a problem that had confused me for a long time. I want to test codegeex4-all-9b. The followings are my code and running log, but I got 0 as the result. There is also no code output in the result file.

accelerate launch  main.py \

--model THUDM/codegeex4-all-9b \ --max_length_generation 4096 \ --tasks humaneval \ --trust_remote_code \ --temperature 0.2 \ --n_samples 1 \ --batch_size 1 \ --allow_code_execution \ --left_padding

The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. Selected Tasks: ['humaneval'] Loading model in fp32 /Miniconda/envs/evalcodegeex/lib/python3.8/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:21<00:00, 5.48s/it] number of problems for this task is 164 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 164/164 [00:19<00:00, 8.26it/s] Evaluating generations... { "humaneval": { "pass@1": 0.0 }, "config": { "prefix": "", "do_sample": true, "temperature": 0.2, "top_k": 0, "top_p": 0.95, "n_samples": 1, "eos": "<|endoftext|>", "seed": 0, "model": "THUDM/codegeex4-all-9b", "modeltype": "causal", "peft_model": null, "revision": null, "use_auth_token": false, "trust_remote_code": true, "tasks": "humaneval", "instruction_tokens": null, "batch_size": 1, "max_length_generation": 4096, "precision": "fp32", "load_in_8bit": false, "load_in_4bit": false, "left_padding": true, "limit": null, "limit_start": 0, "save_every_k_tasks": -1, "postprocess": true, "allow_code_execution": true, "generation_only": false, "load_generations_path": null, "load_data_path": null, "metric_output_path": "evaluation_results.json", "save_generations": false, "load_generations_intermediate_paths": null, "save_generations_path": "generations.json", "save_references": false, "save_references_path": "references.json", "prompt": "prompt", "max_memory_per_gpu": null, "check_references": false } }

loubnabnl commented 2 months ago

Can you check what the solutions look like in the generations.json file? Or you mean they are empty?

(side note: you don't need max_length_generation 4096, 512 is usually enough for HumanEval)