TensorBoard to visualize the MEL and audio ?

KdaiP / StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

MIT License

365 stars 42 forks source link

Hi @KdaiP

I’m trying to add TensorBoard to visualize the MEL and audio as shown below. You can play back the audio to see the epoch.

I managed to get it working, but there is a lot of noise if you see in the end in the audio i mark with red rectagle when played, making it very difficult to listen to. How can I remove this noise? Is it related to the reference and original MEL?

Here is the code I’m using for training:

code i use in train

for epoch in range(current_epoch, train_config.num_epochs):  # loop over the train_dataset multiple times
     ...
     mels = model.module.synthesise(x, x_lengths, 25, 1.0, y, 1.0, "euler", 3.0)['decoder_outputs']

KdaiP / StableTTS

TensorBoard to visualize the MEL and audio ? #25