Closed AadSah closed 1 year ago
The Replit model seems slow because the use_cache
argument in their config is set to false, you can try cloning it and changing it to true before you run inference, I opened a PR on their repo. For CodeGen2 which model exactly are you running?
Also what batch size and number of GPUs are you using? You can also try increasing the batch size to speed things up
Hi @loubnabnl, thanks for your reply! I am running the CodeGen2-3.7B model with a batch size of 10 and a single GPU. Here is the exact command which I am using:
accelerate launch main.py \
--model Salesforce/codegen2-3_7B \
--tasks mbpp \
--temperature 0.1 \
--n_samples 15 \
--batch_size 10 \
--allow_code_execution \
--save_generations \
--metric_output_path codegen2-3.7B-results.json \
--save_generations_path codegen2-3.7B-generations.json \
--trust_remote_code
Regarding Replit model you should be able to run evaluation in ~2h on 1 gpu in full precision. You can try fp16 or bf16 in --precision
argument to speed things up.
As for CodeGen2, if the model inference is slow there's not much we can do about it from the evaluation harness perspective, since the same command runs fast for other models, you can try measuring the tokens/sec speed and contacting the model's authors.
I confirm that inference with the codegen2 series is extremely slow compared to other models of the same size. Codegen2.5 (7B) on the other hand is fast (but based on a different architectures)
Hi, I have been trying to evaluate CodeGen2 and Replit-Code models on the mbpp task, but the code runs extremely slow. While the corresponding eval time for other models is around 2 hours, the ETA for these 2 models varies significantly and sometimes goes up to > 90 hrs. Any help to resolve this issue? Thanks!