Open anhnh2002 opened 2 months ago
Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio.
https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages
Hello, man. I'm very pleased with your contribution. Can you provide your trained models? I want to check if they are working well.
Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio. https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages
Hello, man. I'm very pleased with your contribution. Can you provide your trained models? I want to check if they are working well.
Due to copyright issues, I am currently unable to share the model's weights with you. I apologize for the inconvenience.
How long did it take you to train 100 hours of audio, and can you tell me your current computer configuration?
it took over 8 hours to train 100 hours of audio on single A100 40Gb
How long did it take you to train 100 hours of audio, and can you tell me your current computer configuration?
Due to copyright issues, I am currently unable to share the model's weights with you.
Understandable. However, will you be able to share a snippet audio of what the model has produced?
Due to copyright issues, I am currently unable to share the model's weights with you.
Understandable. However, will you be able to share a snippet audio of what the model has produced?
Please find the relevant file at the following Google Drive link:
Due to copyright issues, I am currently unable to share the model's weights with you.
Understandable. However, will you be able to share a snippet audio of what the model has produced?
Please find the relevant file at the following Google Drive link:
hi man
what your taken loass ?
and how many step ?
Is it possible to train the xttsv2 model for about 10 hours and can it work well only based on these 10 hours?
Actually, I trained the model with your code and reached a loss of 0.5 and used the model and the output was very bad and nothing was audible. I used google/fleurs dataset for Farsi language. First, I expanded vocab, then dave training, and then model training for 10,000 steps What do you think, why am I getting so bad results?
Thank you very much
Due to copyright issues, I am currently unable to share the model's weights with you.
Understandable. However, will you be able to share a snippet audio of what the model has produced?
Please find the relevant file at the following Google Drive link: View File
hi man what your taken loass ? and how many step ?
Is it possible to train the xttsv2 model for about 10 hours and can it work well only based on these 10 hours?
Actually, I trained the model with your code and reached a loss of 0.5 and used the model and the output was very bad and nothing was audible. I used google/fleurs dataset for Farsi language. First, I expanded vocab, then dave training, and then model training for 10,000 steps What do you think, why am I getting so bad results?
Thank you very much
First, I recommend you do not train DVAE (because you have a small amount of data). And I think 10 hours is not enough; it makes the model overfit with your data. The losses I got are about 0.8.
thanks for your good job and reply i do that and loss : | > avg_loader_time: 0.18475866317749023 (+0.00680994987487793) | > avg_loss_text_ce: 0.036836352199316025 (-0.0016442164778709412) | > avg_loss_mel_ce: 0.03139156103134155 (-0.001425366848707199) | > avg_loss: 0.06822791695594788 (-0.003069579601287842)
but after inference one of sentence that trained on i get worse audio that not in trained lang And even the sound that is produced is not close to the trained language at all
How many epochs and steps are required for training on 100 hours of data? And it took a few hours my friend
Hi, nice work! You might want to try to create a merge request for it into a still maintained fork of coqui-ai: https://github.com/idiap/coqui-ai-TTS
I'm not involved with it, just an idea.
How many epochs and steps are required for training on 100 hours of data? And it took a few hours my friend
2 epochs work well for me
2 epochs work well for me for new lang , after train we need train vocoder ?
and If lose decreases and becomes less than 1, but it still reads the text incorrectly, what is your opinion about this? What do you advise me to do to solve this problem, maybe my important problem is solved thank you
I want to teach on limited sentences of a new language For example, on 1000 sentences What is your opinion about this??? Is it possible??
I don't want to train the model on the whole language
I want to teach on limited sentences of a new language
For example, on 1000 sentences
What is your opinion about this??? Is it possible??
I think it's impossible to overfit the model with only 1000 sentences, especially for a new language. You'd need to extend the tokenizer and likely train a base model on a larger dataset of that language first.
I think it's impossible to overfit the model with only 1000 sentences, especially for a new language. You'd need to extend the tokenizer and likely train a base model on a larger dataset of that language first.
Thank you very much, so your opinion is that my problem is the small amount of data and I cannot get good results from this model that I have trained on few sentences and it must be trained on a large amount of data. I expanded vocab and taught dave Honestly, I wanted to test first that the model is trained on little data and how the result will be, then run it on a lot of data. Another question I have is how much lr should I put??? That the learning of the model for other languages is not lost and that the model learns well and quickly for a new language and on a lot of data
Thank you for paying the zakat of your knowledge :)
I don't want to train the model on the whole language
I want to teach on limited sentences of a new language For example, on 1000 sentences What is your opinion about this??? Is it possible??
I think it's impossible to overfit the model with only 1000 sentences, especially for a new language. You'd need to extend the tokenizer and likely train a base model on a larger dataset of that language first.
In short, teaching a language with 10 letters and about 100 sentences is not possible? So that the model reads these 100 trained sentences correctly?
Hey, great work!
I am having a question: I want to train this model on Vietnamese, but with vi-north and vi-south as separate languages and have separate metadata csvs for them. Does the multidataset training option support this and shuffle both the vi-north and vi-south data together with separate languages beforehand?
Thank you in advance!
Yes, you can
Hello everyone, below is my code for fine-tuning XTTS for a new language. It works well in my case with over 100 hours of audio.
https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages