bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
710 stars 183 forks source link

Add WizardCoder models (that are CodeLLama based) evaluation #133

Closed loubnabnl closed 10 months ago

loubnabnl commented 10 months ago
xuehui1991 commented 7 months ago

@loubnabnl Hello loubnabl, I have question for this PR. I use the following command to test the WizardCoder in bigcode-evaluation-harness, it seems the inference time is almost 13hours(it only take few minutes for codellama). Can you give me some suggestion? Really thanks.

the command as following:

MODEL_PATH="WizardLM/WizardCoder-Python-13B-V1.0"
MODEL_NAME="WizardCoder-Python-13B-V1.0-instructprompt"

accelerate launch --num_processes=6 main.py \
  --model ${MODEL_PATH} \
  --max_length_generation 512 \
  --tasks instruct_wizard_humaneval \
  --temperature 0.1 \
  --n_samples 1 \
  --batch_size 10 \
  --allow_code_execution \
  --save_generations \
  --save_generations_path results/generations_${MODEL_NAME}.json \
  --metric_output_path results/evaluation_${MODEL_NAME}.json \
  --use_auth_token
loubnabnl commented 7 months ago

Hi that's because the model has use_cache set to False here https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0/blob/main/config.json#L25 You can force it to True by adding use_cache=True in model loading. Maybe we can turn it on by default in the harness if it's not the case