Open LEECHOONGHO opened 4 months ago
Do you have a standard tensorboard logs? It is interesting to compare.
@patriotyk Sorry, I've change the code to log on WandB server. I have no local logging files nor tensorboard logs.
What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.
Training Loss, Generated Outputs.
I hope this will be a reference for model training.
TKS for your work,could your share 32k model training detail like: your encodec model(i found pretrained models :24k and 48k,so i guess 32k resample to 24k or 48k for encodec pretrained model,then resample to 32k ??)
Training Loss, Generated Outputs. I hope this will be a reference for model training. https://api.wandb.ai/links/xi-speech-team/k0kdfwch
TKS for your work,could your share 32k model training detail like: your encodec model(i found pretrained models :24k and 48k,so i guess 32k resample to 24k or 48k for encodec pretrained model,then resample to 32k ??)
I'm sry for your confuse. I just trained Mel Vocoder not for encodec's decoder.
But I have plans to train Mel-Encodec?(Mel Spectrogram to RVQ Encoder, and Vocos Decoder for Various Speech data) in the future.
Do you have a standard tensorboard logs? It is interesting to compare.
What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.
I estimated mel loss, and Generator loss with newly gained dataset. and each was 0.0942 and 2.82. Because of the dataset's Size, estimating Eval loss with eval dataset have no difference with sampled train data.
how about your model output's quality? any artifacts?
Do you have a standard tensorboard logs? It is interesting to compare.
What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.
I estimated mel loss, and Generator loss with newly gained dataset. and each was 0.0942 and 2.82. Because of the dataset's Size, estimating Eval loss with eval dataset have no difference with sampled train data.
how about your model output's quality? any artifacts?
I am still training(third week). It is very slow. I will update with my results when finish.
how much data do we need for training
@LEECHOONGHO I have published my model here https://huggingface.co/patriotyk/vocos-mel-hifigan-compat-44100khz Sounds great, and there is metrics. @Mahmoud-ghareeb My model has been trained on 800+ hours of audio. Vocoder doesn't require text transcripts so you can easily use audio books for training. You even don't need to cut it by silence because vocos anyway internally splits provided audios to smaller segments.
Great work! @patriotyk, Thank you so much
@LEECHOONGHO I have published my model here https://huggingface.co/patriotyk/vocos-mel-hifigan-compat-44100khz Sounds great, and there is metrics. @Mahmoud-ghareeb My model has been trained on 800+ hours of audio. Vocoder doesn't require text transcripts so you can easily use audio books for training. You even don't need to cut it by silence because vocos anyway internally splits provided audios to smaller segments.
I'm new to this... Could you please tell me what's the purpose of sharing the model? I mean, when I try to use it with a wav file, the output is very close to the original input file... So I'm confused here.
Thank you
This model generates audio from mel spectrograms. The functionality that you tried just generates mel from audio and then back audio from mel. But real tts systmes generate mels directly from text then vocoder generates audio.
Ah ok so generating mel from audio is different from what tts systems do? Is there any code snippet that would let me test the model you trained (ans possibly others)? Thank you!
Training Loss, Generated Outputs.
I hope this will be a reference for model training.
https://api.wandb.ai/links/xi-speech-team/k0kdfwch