Add WizardCoder models (that are CodeLLama based) evaluation

loubnabnl commented 10 months ago

The WizardCoder models (starting from CodeLlama) now add <s> token at the beginning of tokenized text but have bos_token set at </s> which impacts the post-processing (see issue)
We add evaluation using the same prompt and post-processing (an Instruction version of HumanEval slightly different from HumanEvalSynthesize) as the original authors here (could be moved to HumanEvalPack task later)

xuehui1991 commented 7 months ago

@loubnabnl Hello loubnabl, I have question for this PR. I use the following command to test the WizardCoder in bigcode-evaluation-harness, it seems the inference time is almost 13hours(it only take few minutes for codellama). Can you give me some suggestion? Really thanks.

the command as following:

MODEL_PATH="WizardLM/WizardCoder-Python-13B-V1.0"
MODEL_NAME="WizardCoder-Python-13B-V1.0-instructprompt"

accelerate launch --num_processes=6 main.py \
  --model ${MODEL_PATH} \
  --max_length_generation 512 \
  --tasks instruct_wizard_humaneval \
  --temperature 0.1 \
  --n_samples 1 \
  --batch_size 10 \
  --allow_code_execution \
  --save_generations \
  --save_generations_path results/generations_${MODEL_NAME}.json \
  --metric_output_path results/evaluation_${MODEL_NAME}.json \
  --use_auth_token

loubnabnl commented 7 months ago

Hi that's because the model has use_cache set to False here https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0/blob/main/config.json#L25 You can force it to True by adding use_cache=True in model loading. Maybe we can turn it on by default in the harness if it's not the case

bigcode-project / bigcode-evaluation-harness

Add WizardCoder models (that are CodeLLama based) evaluation #133