Closed Johell1NS closed 5 months ago
Hello.
Currently the integrated TTS uses Silero TTS which has no italian model. I plan to add another default TTS though which has a lot more languages (and might sound better too).
But you have a couple other options currently as well.
You can install the Plugins from inside the UI application in the Plugins Tab. You can see the current list of Plugins here: https://github.com/Sharrnah/whispering/blob/main/documentation/plugins.md
HI, I'm trying to do what you told me but it's not working. I installed the CoquiTTS plugin, and as a template I selected the one you indicated: But when I go to General, I don't find anything in the language menu: In fact, then going into the Text-To-Speech function which is the one I was interested in, I don't see the plugin model: Can you tell me how I can solve it? A thousand thanks, Alessio
Hi.
Make sure to check under "Advanced" -> "Logs" if it maybe still downloads the AI model. the language field is most likely only filled when it finished downloading and loading the AI model.
Also make sure to disable the integrated Text-to-Speech when starting with the profile.
TTS Plugins should disable the integrated TTS automatically, but when i tested it yesterday, it didn't for me so i might have a bug in that plugin.
The Model entries in the Text-to-Speech tab you show are still from the Silero TTS.
The Model Dropdown in the Speech-to-Text tab is not used by the Coqui Plugin. Only the Voices Dropdown for some of the models (like the XTTS v2 model)
I hope that helps.
//EDIT: Made a small fix for the Plugin, so it should now disable the integrated TTS and also correctly starting / stopping when the Plugin is activated or disabled. (now version 1.1.7)
HI, In the log I see that the model has been downloaded, but I also see a runtime error, could this be why it doesn't work for me?
Hi.
I checked where it fails. That error probably happens because you still have selected a Silero TTS voice.
XTTS v2. needs either one of the precreated voices or a voice sample you want to clone.
So if you have Silero disabled, you should see a list of the default voices you can choose in the Text-to-Speech tab:
Or you select no voice:
And instead use a wav file for voice cloning in the plugin settings and have voice_change_source
set to "Text to Speech":
(No its not using RVC. that was just an audio file i had lying around. You can use the RVC Plugin together with the Plugin though to get better voice cloning results.)
HI, I put the file for cloning the voice, but in Text-to-Speech, if I try to click on the model and voice menu, I see an empty list, I can't select anything. This has been happening since I installed the plugin, before I had lists.
Can you update the Plugin to version 1.1.7
And then under Model
press Load Model
?
This should restart the coqui part and should also fill the language select box in the plugin when it finished loading.
If that still doesn't help, you might try installing espeak-ng. Some TTS models in coqui TTS require this. (Though i couldn't find that XTTSv2 needs it).
https://github.com/espeak-ng/espeak-ng/releases
Also can you maybe try a different TTS model in the Plugin? For example tts_models/it/mai_male/vits
?
You might have to set sample_rate_overwrite
to 16000 for that model. (I think i missed that one that it uses a lower sample rate than most other models.)
And sorry for the issues. If we find the issue i hope i can make it easier in the future. It seems Italian is quite rare in TTS. I checked the fairseq models who support over 1000 languages but couldn't find italian in the list. 👀
I updated the plugin to version 1.1.7, then I tried to reload the model as you instructed but nothing changed. I also installed espeak-ng, but again nothing new. Lastly, I tried to change the model by putting tts_models/it/mai_male/vits with sample_rate_overwrite on 16000, but again it didn't work. The "language" menu under General is always empty.
The `tts_models/it/mai_male/vits´ model is single language, so it doesn't fill the language menu under General.
But if even that model doesn't work, i guess there is something wrong in general with the TTS Backend. So maybe something went wrong on downloading / extracting it?
Can you try and delete Plugins\coqui_tts_plugin\coqui-tts
and let the Plugin redownload the dependency?
And it would be really nice to see the Log if something fails. Otherwise its hard to guess whats exactly wrong, unless its the exact same error as previously.
--
Also depending on what you intend to use it for, you can still try the Bark Plugin which supports Italian. Or if you are okay with using a cloud based TTS, you can give the Elevenlabs Plugin a try. Though it costs some money if using more than 10.000 characters per month.
If you know of any other TTS that supports Italian i would be happy to have a look at it and if i can implement it. I am really surprised that its so uncommon it seems.
I deleted Plugins\coqui_tts_plugin\coqui-tts and let the plugin download the dependency again, but the problem didn't go away. I'm attaching the log file, but I don't understand why it's written in Japanese/Chinese (or I don't know which oriental language), I don't know if you understand anything about it, but I'm attaching it anyway. log.txt
Thank you @Johell1NS
It didn't directly point to the issue, but at least i think i can say that the errors in the log point to the Coqui process not correctly starting / stopping.
In combination to an earlier error my current idea is that maybe the downloaded models are not loadable which crashes the process.
It would be nice if you could check if you have the folder Plugins\coqui_tts_plugin\tts\tts_models--multilingual--multi-dataset--xtts_v2
and it has 5 files and all is 1,74 GB big?
since the pth files are basically just zip files containing the model data, and an earlier error pointed to pyTorch not able to read a zip file, i suspect that some files are corrupted.
You can try deleting the above mentioned folder (Or other model folders in the Plugins\coqui_tts_plugin\tts\
Path)
And let it redownload. You can check the download in the Log if you do it.
(I am working on some better way to show it outside the Log)
See also this issue i found about this possible issue: https://github.com/coqui-ai/TTS/issues/3605
HI, By reinstalling the entire plugin I managed to get it working, even if it doesn't seem very stable (I'm trying the xtts_v2 model). It accepts a short string, otherwise it gives a warning, but even with short strings, when I do the first play it seems to go well, then if I repeat it, perhaps changing my voice, it starts to say nonsense words. Basically every time I want to make a new play I have to restart the Beckend. Another error I found is that it pronounces punctuation marks
Hi. nice that you got it working.
VAD Speech confidence
and Speech volume Level
to compensate for that. (Optionally with the AI Denoise
option, though that does not influence the recording trigger.)HI, I disabled Speech2Text and actually no longer had hallucinations.
As for the warning, I'll give you the screenshoot because I can't copy and paste from the log. P.S. but why do I see the log file in Japanese or oriental characters?
As for punctuation, from my tests only the period (.) confuses it, I'm replacing it with the semicolon (;). In my opinion, punctuation marks cannot be excluded, because they contribute to giving the right rhythm that we desire to the pronunciation of the sentence.
I played a bit with it yesterday too and had that warning. I guess either Coqui has no sentence splitting integrated or it has but still warns. (since sometimes it can be troublesome to split at the right places).
I could implement a custom sentence splitting if that is really making trouble, but i had no issue generating a rather long text.
Because of the punctuation: If it has no issue with other types of punctuations, i could add a replace function for these. But i agree that they are important for pronounciation.
I don't know why you see some japanese characters. Could be that it is confused about some special characters i also saw in the log file you posted. I wouldn't worry too much about it. :)
--
I also worked and improved the sentence splitting for Bark yesterday. I have a Beta Version of the Bark plugin with it integrated. Official release would be with the next Whispering Tiger release, or i can post it here (Or you can find it in the Discord Server) in case you are interested in trying that out.
Thanks so much for the support!
In the "Text-to-Speech" function I cannot find a model that has the Italian language. Is it possible to use "Text-to-Speech" with text and related audio generated in Italian? What do you mean? Thank you so much. Thank you.