ZhengkunTian / Speech-Tranformer-Pytorch

Seq2Seq Speech Recognition with Transformer on Mandarin Chinese
113 stars 26 forks source link

a question about embedding layer before encoder #2

Closed huangnengCSU closed 5 years ago

huangnengCSU commented 5 years ago

Hi: In the code, you replace the src_embed layer with a linear layer, then the encoder output size is equal to the input size, so the src_padding_mask matrix is easy to compute. But if replace the src_embed layer with convolution layer(maybe together with pooling) for better performance, then the encoder output size maybe different to the input size, in this case, how can we compute the src_padding_mask?

ZhengkunTian commented 5 years ago

hi, I'm sorry to have seen the news for so long. If you want to use convolution layer as src_embed layer, 1D conv layer is a good choice, you can make the number of out channels equal and output size equal. Best regards.