bytedance / neurst

Neural end-to-end Speech Translation Toolkit
Other
298 stars 45 forks source link

batch size should be greater than max source len? why? #36

Closed w4-jonghoon closed 3 years ago

w4-jonghoon commented 3 years ago

https://github.com/bytedance/neurst/blob/20bc196211f4e09d63ab4b0b1a42c4c62514c052/neurst/tasks/speech2text.py#L298

In the above code lines, you asserted when batch size is less than maximum source length. AFAIK, batch_size means the number of samples in one batch process, and max source length is the maximum number of frames (signals) across source samples, aren't they? Then, why batch size should be greater than max source length?

zhaocq-nlp commented 3 years ago

In our implementation, the batch_size and batch_size_per_gpu mean the number of audio frames in one batch, instead of the number of samples. So it is recommended to select a value greater than the largest audio frames.

w4-jonghoon commented 3 years ago

Thank you @zhaocq-nlp . While my experiment - Must-C en-zh, I got that case. I should figure out why batch size is set as just 64. Thank you^^

zhaocq-nlp commented 3 years ago

@w4-jonghoon However, it is only available for training. During inference or evalution, the batch_size means the number of samples. It may be a little confused. I will try to make it clear. Thank you the reminder.

w4-jonghoon commented 3 years ago

I think it is more acceptable. Thank you.