bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
710 stars 183 forks source link

args.batch_size vs args.num_samples #140

Closed awasthiabhijeet closed 9 months ago

awasthiabhijeet commented 9 months ago

Looking at the implementation, it seems batch_size is playing the role of num_samples.

For instance, here, the comment says that:

do not confuse args.batch_size, which is actually the num_return_sequences

Similarly, I do not find args.num_samples being passed anywhere in calls to model.generate(). Infact, calls to model.generate() in utils.py set num_return_sequences=batch_size.

In contrast, main.py describes batch size as

Batch size for evaluation on each worker, can be larger for HumanEval

Is this a bug?

CC: @loubnabnl

loubnabnl commented 9 months ago

args.n_samples is the number of solutions you want to generate for each prompt however you can't always use it as num_return_sequences since it might OOM for large models for example, which is why we introduced the other argument args.batch_size that defines the effective number of sequences that you ask your model to generate, this is done until you we reach args.n_samples that the user requested defined by n_copies here

awasthiabhijeet commented 9 months ago

Thank you for the explanation, @loubnabnl !