Closed w4-jonghoon closed 3 years ago
In our implementation, the batch_size
and batch_size_per_gpu
mean the number of audio frames in one batch, instead of the number of samples. So it is recommended to select a value greater than the largest audio frames.
Thank you @zhaocq-nlp .
While my experiment - Must-C en-zh, I got that case.
I should figure out why batch size is set as just 64
.
Thank you^^
@w4-jonghoon However, it is only available for training. During inference or evalution, the batch_size
means the number of samples. It may be a little confused. I will try to make it clear. Thank you the reminder.
I think it is more acceptable. Thank you.
https://github.com/bytedance/neurst/blob/20bc196211f4e09d63ab4b0b1a42c4c62514c052/neurst/tasks/speech2text.py#L298
In the above code lines, you asserted when
batch size
is less thanmaximum source length
. AFAIK,batch_size
means the number of samples in one batch process, andmax source length
is the maximum number of frames (signals) across source samples, aren't they? Then, whybatch size
should be greater thanmax source length
?