NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
889 stars 177 forks source link

Back step problem #55

Closed VadimOvchinnikovA closed 4 years ago

VadimOvchinnikovA commented 4 years ago

I can't understand. Why are you iterating from 1 in forward method of AR_Back_Step?

def forward(self, mel, text, mask, out_lens):
        mel = torch.flip(mel, (0, ))
        # backwards flow, send padded zeros back to end
        for k in range(1, mel.size(1)):
            mel[:, k] = mel[:, k].roll(out_lens[k].item(), dims=0)

        mel, log_s, gates, attn = self.ar_step(mel, text, mask, out_lens)

        # move padded zeros back to beginning
        for k in range(1, mel.size(1)):
            mel[:, k] = mel[:, k].roll(-out_lens[k].item(), dims=0)

        return torch.flip(mel, (0, )), log_s, gates, attn
rafaelvalle commented 4 years ago

mel is of shape (n_frames, batch) and the batch dimension is sorted by decreasing mel length. this guarantees that the first sample will not be padded, hence there are no padded zeros to send back to the end.

rafaelvalle commented 4 years ago

Heads up that we changed the code to cover cases where the batch with the larger text is does not have the longest mel.