Open lpscr opened 2 months ago
Hi, could it be the padding part of audio?
If validation data is taken from a batch, StableTTS does not sort the sequences in the batch in descending order of length like the official Vits source code. Therefore, there may be padding parts later on.
Hi @KdaiP
I’m trying to add TensorBoard to visualize the MEL and audio as shown below. You can play back the audio to see the epoch.
I managed to get it working, but there is a lot of noise if you see in the end in the audio i mark with red rectagle when played, making it very difficult to listen to. How can I remove this noise? Is it related to the reference and original MEL?
Here is the code I’m using for training:
code i use in train