class TokenizedDataset(IterableDataset):
def __iter__(self):
...
if self.n_copies == 1 and self.n_tasks % self.num_devices != 0:
self.n_copies = 2
warnings.warn(
"n_copies (n_samples/batch_size) was changed from 1 to 2 because n_tasks isn't proportional to num devices"
)
...
【Results】
I get the UserWarning: n_copies (n_samples/batch_size) was changed from 1 to 2 because n_tasks isn't proportional to num devices.
And the harness generated 2 samples for 163 tasks,and 4 samples for 1 tasks before removed extra predictions to only keep nsamples=1。
Why not keep n_copies=1,and just generate 1 samples for 163 tasks and 2 samples for 1 tasks?It make n_tasks = 1 * 163 + 2 * 1 = 165, which also ensure n_tasks % num_devices == 0.
【Related Code】 bigcode_eval/utils.py
【My Setting】
【Results】 I get the
UserWarning: n_copies (n_samples/batch_size) was changed from 1 to 2 because n_tasks isn't proportional to num devices
. And the harness generated2 samples
for163 tasks
,and4 samples
for1 tasks
beforeremoved extra predictions to only keep nsamples=1
。 Why not keepn_copies=1
,and just generate1 samples
for163 tasks
and2 samples
for1 tasks
?It maken_tasks = 1 * 163 + 2 * 1 = 165
, which also ensuren_tasks % num_devices == 0
.