Closed cosmo3769 closed 8 months ago
Can you run accelerate config
to enable GPU usage and not CPU?
Hi @loubnabnl,
I ran accelerate config with this setting:
But still getting the same problem:
Is there any setting I am choosing wrong? Also when checking my gpu with nvidia-smi
, I can see the GPU is there but it is in Off
state. Does this matter? If it does, how to turn it on? Thank you!
It seems in Jupyter environement you need to manually create the yaml config file and reference it in your accelerate launch
1- create config.yaml
:
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: no
num_machines: 1
num_processes: 1
gpu_ids: all
use_cpu: false
2- run
!accelerate launch --config_file config.yaml main.py \
--model bigcode/starcoderbase-1b \
--max_length_generation 512 \
--tasks humaneval \
--temperature 0.2 \
--limit 50 \
--n_samples 20 \
--batch_size 20 \
--use_auth_token \
--allow_code_execution
This works for me: colab
Thanks a lot @loubnabnl. This worked for me too.
bigcode/starcoderbase-1b
model on colab with single T4
GPU.bigcode/starcoderbase-3b
model on colab with single T4
GPU. But it failed to run due to OOM. So I ran in kaggle kernel with 2 T4
GPUs and I was able to run it with one change: Changed --limit
from 50 to 10.Also, I have one question: if I quantize the model to GGUF format, is it possible to run benchmark on this format too?
Hi, we don't support GGUF format but you can try using 4bit or 8bit precision to reduce memory footprint when loading the model using the flag --load_in_4bit
for example.
Btw --limit
shouldn't impact the memory that's just the number of HumanEval problems to use. It's the --batch_size
flag that determines how many samples to fit in a batch (out of n_samples
) that you should lower, try setting it to 1 for lowest memory consumption (but eval will be slower).
For n_samples
if you're using greedy (do_sample False) then set it to one because you don't sample, if you use sampling (do_sample True and temperature 0.2) then 20 should be enough for an accurate number.
Hi, we don't support GGUF format but you can try using 4bit or 8bit precision to reduce memory footprint when loading the model using the flag
--load_in_4bit
for example.Btw
--limit
shouldn't impact the memory that's just the number of HumanEval problems to use. It's the--batch_size
flag that determines how many samples to fit in a batch (out ofn_samples
) that you should lower, try setting it to 1 for lowest memory consumption (but eval will be slower).For
n_samples
if you're using greedy (do_sample False) then set it to one because you don't sample, if you use sampling (do_sample True and temperature 0.2) then 20 should be enough for an accurate number.
Thank you for the clarification.
I was using bigcode-evaluation-harness to HumanEval benchmark CodeLlama model.
I am using this command from docs README:
accelerate launch main.py \ --model codellama/CodeLlama-7b-hf \ --max_length_generation 200 \ --tasks humaneval \ --temperature 0.2 \ --n_samples 200 \ --batch_size 10 \ --allow_code_execution
It is successfully downloading the model shards. But when loading checkpoint shards, I am getting an error “raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)”. Even if I am using T4 in colab or 2*T4 in kaggle, I am still getting this error.
Full error log:
When investigating, I found that even turning the GPU on, it is still only using the system RAM.
Here is my colab link.
How to resolve this? Any more setup I need to do in order to run this successfully? Thank you!