Open danpovey opened 3 years ago
I think Batchnorm is per dimension, so the masked part will not affect the unmasked part?
That's the time-index, not the dimension.
On Mon, Nov 9, 2020 at 10:18 PM Yiming Wang notifications@github.com wrote:
I think Batchnorm is per dimension, so the padding part will not affect the "real" part?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/freewym/espresso/issues/46#issuecomment-724041349, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5N6MCSXHVDFPG7CUDSO723LANCNFSM4TPMB55A .
Oh OK. How does Kaldi deal with it? Guarantee same length within a batch so no padding?
Issue doesn't arise in Kaldi, we use regular minibatches. In Lhotse our plan would be to use lhotse itself to do padding (which would add silence or noise). So we wouldn't be messing about with zeros throughout the propagation,it would be a real signal. But I expect batchnorm may support a mask matrix.
On Mon, Nov 9, 2020 at 11:50 PM Yiming Wang notifications@github.com wrote:
Oh OK. How does Kaldi deal with it?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/freewym/espresso/issues/46#issuecomment-724098118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3G7C4MXSNFEGGLTPTSPAFSVANCNFSM4TPMB55A .
It looks like the batchnorm doesn't take into account the masking:
https://github.com/freewym/espresso/blob/6fca6cacd9d475d2676c527999e2d1bde08e7cbb/espresso/models/speech_tdnn.py#L170
Surely this isn't right? However I don't know how to take it into account.