Closed loubnabnl closed 10 months ago
@loubnabnl Hello loubnabl, I have question for this PR. I use the following command to test the WizardCoder in bigcode-evaluation-harness, it seems the inference time is almost 13hours(it only take few minutes for codellama). Can you give me some suggestion? Really thanks.
the command as following:
MODEL_PATH="WizardLM/WizardCoder-Python-13B-V1.0"
MODEL_NAME="WizardCoder-Python-13B-V1.0-instructprompt"
accelerate launch --num_processes=6 main.py \
--model ${MODEL_PATH} \
--max_length_generation 512 \
--tasks instruct_wizard_humaneval \
--temperature 0.1 \
--n_samples 1 \
--batch_size 10 \
--allow_code_execution \
--save_generations \
--save_generations_path results/generations_${MODEL_NAME}.json \
--metric_output_path results/evaluation_${MODEL_NAME}.json \
--use_auth_token
Hi that's because the model has use_cache
set to False here https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0/blob/main/config.json#L25
You can force it to True by adding use_cache=True
in model loading. Maybe we can turn it on by default in the harness if it's not the case
<s>
token at the beginning of tokenized text but havebos_token
set at</s>
which impacts the post-processing (see issue)