Closed LittleFlyingSheep closed 3 years ago
Hi,
keeping the summary of the input sequence, is just a way for doing sequence-to-sequence processing.
Thanks for your reply. Can I understand it as follows? the encoder (GRUs) deals the input sequence of audio feature and outputs a sequence of hidden feature. Then we select the last step of the output sequence as the summary of the whole sequence, and expand it as the max_out_t_steps.
The expansion for max_out_t_steps
is just a way to re-use the summary for every time step of the decoder. :)
Hi,
I'm closing this issue. If you have any further questions, please feel free to create another issue.
In the baseline, there is a code in 'baseline_dcase.py' as
h_encoder: Tensor = self.encoder(x)[:, -1, :].unsqueeze(1).expand(-1, self.max_out_t_steps, -1)
. Why the baseline just remain the last dim of the output of encoder?