I tried the batch inference in XTTS, So I am doing padding till the max text sequence in the batch and also adding the attention mask for this, But for shorter sequences,
I am getting some random noise at the end of the audio
It would be helpful if we get this feature in Coqui tts.
I tried the batch inference in XTTS, So I am doing padding till the max text sequence in the batch and also adding the attention mask for this, But for shorter sequences, I am getting some random noise at the end of the audio It would be helpful if we get this feature in Coqui tts.