Closed hubeibei007 closed 4 years ago
I assume you modified this code to something of this sort. Can you submit a pull request?
lens = (lens.cpu().numpy() / 2 ** len(self.convs))
lens = lens.round().astype(int)
out = nn.utils.rnn.pack_padded_sequence(
out, lens, batch_first=True, enforce_sorted=False)
self.gru.flatten_parameters()
_, out = self.gru(out)
OK, I will prepare and submit a pull request.
In the ReferenceEncoder, it only use the mels as inputs. When training or batch inference, the mels are padded_data, it seems that the ReferenceEncoder should use the actual mel lengths to get the last GRU hidden state. Please correct me if any wrong understandings.
Add my test results. when ReferenceEncoder get the last hidden state by pack_padded_sequence, in my dataset the clusters of style tokens are more clear than without pack_padded_sequence.