bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
795 stars 214 forks source link

Query around n_samples argument #48

Closed murthyrudra closed 1 year ago

murthyrudra commented 1 year ago

Hi, I am performing code generations using the following command

accelerate launch  main.py --model bigcode/santacoder --tasks humaneval --max_length_generation 256 \
--temperature 0.8 --top_p 0.95 --do_sample True --generation_only --n_samples 100 --batch_size 32 \
--output_generations generations/santacoder_temperature_0.8_top_p_0.95_task_humaneval.json \
--save_generations --allow_code_execution --trust_remote_code

I am expecting the number of candidate generations per task to be around 100. However, on inspecting the generations/santacoder_temperature_0.8_top_p_0.95_task_humaneval.json file I see that there are 96 generations per task.

Is there something I am missing? Thanks

loubnabnl commented 1 year ago

Hi, can you share the number of GPUs you did the execution on and if you got any warnings during the process?

murthyrudra commented 1 year ago

We are using 1 A100 80GB GPU for the inference. The accelerate config used is specified below

$ accelerate config
In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0
Which type of machine are you using? ([0] No distributed training, [1] multi-CPU, [2] multi-GPU, [3] TPU [4] MPS): 2
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
Do you want to use DeepSpeed? [yes/NO]: NO
Do you want to use FullyShardedDataParallel? [yes/NO]: NO
How many GPU(s) should be used for distributed training? [1]:1
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all
Do you wish to use FP16 or BF16 (mixed precision)? [NO/fp16/bf16]: bf16

This is the log message when I run the command

$ accelerate launch  main.py --model bigcode/santacoder --tasks humaneval --max_length_generation 256 \ 
--temperature 0.8 --top_p 0.95 --do_sample True --generation_only --n_samples 100 --batch_size 32 \
--output_generations new_generations/santacoder_temperature_0.8_top_p_0.95_task_humaneval.json \
--save_generations --allow_code_execution --trust_remote_code

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.

Selected Tasks: ['humaneval']
Loading the model and tokenizer
generation mode only
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.09it/s]
Generating solutions for task 164 
number of problems for this task is 164

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 492/492 [25:02<00:00,  3.05s/it]
generations were saved
loubnabnl commented 1 year ago

I see, the issue is your n_samples isn't proportional to your batch size (32), this case should be handled here when defining n_copies in this case it was 3 when it should've been 4. While this gets fixed, you can try using a batch_size of 10, 20, 50 or even 100, since you have 80GB a batch size of 100 with Santacoder should fit in memory and would be much faster.

murthyrudra commented 1 year ago

Thanks, I will go ahead with n_samples = 100 and batch size = 100.

murthyrudra commented 1 year ago

Hi, this setting worked for Santacoder model. I am performing similar experiment with Codegen-2B model. I am unable to use batch_size of 100 gives me CUDA OOM error. If I need to get 100 samples generated, what should be the value for batch_size and n_samples?

loubnabnl commented 1 year ago

Hi, as I said nsamples just needs to be proportional to batch size. For example for nsamples 100, you can use a batch size of 2, 5, 10, 20, 25…

murthyrudra commented 1 year ago

Thanks, It's working