erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
864 stars 98 forks source link

Need help :( #294

Closed FNVDesigns closed 1 month ago

FNVDesigns commented 1 month ago

Hey erew123,

first of all, i'm really a noob when it comes to computers, phyton etc., so please bear with me if i don't understand something right away.

I had version 1.8? (Unfortunately I don't know where I can look up which version. The files in the folder are all from February 10, 2024, if that helps you in any way.)

This version was perfect. With 99% of all audio samples, the generated audio file was a fantastic copy of the original voice. I had a big project where I translated about 10 hours of English audio tracks into German and then used "Alltalk TTS Generator" to have the text spoken in German.

Got sick and had to take a break for a few months. When I was healthy again, I saw that there was a new version "1.9c". Downloaded this version and the same audio samples I had used in the old version suddenly sounded very "robotic". The voices are no longer copied correctly and the flow of speech is no longer as smooth as in the old version.

Has anything changed in these versions that affects the quality negatively? I use XTTSv2 Local (xttsv2_2.0.2) in both versions.


I would simply use the old version again, but it can no longer be opened since I installed "1.9c". If i click on "start_alltalk.bat", then the console window opens for half a second and closes again. After that nothing happens.

So I thought, ok, I'll download the new beta version and then I'll have to do some model fine-tuning. At least it worked ok with the one trained voice.

However, I can't change the model used in the "AllTalk TTS Generator". (Or is that possible somewhere?)

I only have the option to change the model in the "Generate TTS" tab. But I don't like it, because the generated audio track is not generated in "chunks" and I have no possibility to edit it afterwards. At least in the German language, the generated audios have too many errors. So it's perfect if you can edit shorter "chunks" and regenerate them again.


So I thought, ok, then I'll go back to "1.9c" and use the fine-tuning model there. I moved the folder "trainedmodel" to "alltalk_tts>models".

According to the documentation, the model should be detected and displayed in the start window for selection. But "XTTSv2 FT" is not displayed.

001

Would it work if I moved the complete contents of the "trainedmodel" to the "xttsv2_2.0.2" folder and replaced the files there?


I apologize for my long text and thank you in advance. best wishes and thank you for this great program!

erew123 commented 1 month ago

Hi @FNVDesigns

Ok a lot to work through here.....

1) All previous builds can be downloaded from the releases as zip files https://github.com/erew123/alltalk_tts/releases so you can find those there if you want to.

2) the console window opens for half a second and closes again go to a command prompt, move into the folder and run the start_alltalk.bat from there. If there is an error thats causing it to close, you will see the error message, rather than the command prompt window just disappearing.

3) Has anything changed in these versions that affects the quality negatively? No, nothing springs to mind that would have changed anything with that. Ive had a glance over the commits to double check and there should be nothing that changes it. Generally speaking though, make sure that the text you input to be generated, appears at the AllTalk console/terminal window and if it does, then its handed over to the Coqui TTS engine/AI model to generate, which at least proves its handed over the text correctly. Beyond that, assuming you are using the same model file and voice sample file, it should be generating the same.

4) BETA VERSION However, I can't change the model used in the "AllTalk TTS Generator". (Or is that possible somewhere?) You can swap TTS engines and models in the Gradio interface on the BETA version.

image

image

5) But I don't like it, because the generated audio track is not generated in "chunks" the TTS Generator still exits in the BETA.

image

6) According to the documentation, the model should be detected and displayed in the start window for selection. But "XTTSv2 FT" is not displayed. In v1 the folder for the finetuned model is a bit more complicated to load. The folder name would need to be called /models/trainedmodel (specifically the folder name needs to be trainedmodel for v1 to detect it),

In the v2 BETA, all XTTS models, as long as they are in the models/xtts/modelname folder, they are available for loading in the gradio page shown above on 4

image

and the documentation is in the Gradio interface

image

The long and short though, there should be nothing that has changed anything audio wise with generation.

Hope that gives you a few things to try/look at.

Thanks