Text-to-Speech in Italian?

Johell1NS commented 6 months ago

In the "Text-to-Speech" function I cannot find a model that has the Italian language. Is it possible to use "Text-to-Speech" with text and related audio generated in Italian? What do you mean? Thank you so much. Thank you.

Sharrnah commented 6 months ago

Hello.

Currently the integrated TTS uses Silero TTS which has no italian model. I plan to add another default TTS though which has a lot more languages (and might sound better too).

But you have a couple other options currently as well.

You can install the CoquiTTS Plugin which has many TTS models. And its newest model XTTSv2 also supports Italian.
You can Install the Bark Plugin which also supports Italian. (Though its less "stable" meaning it can happen that it speaks some random stuff and most likely slower too).

You can install the Plugins from inside the UI application in the Plugins Tab. You can see the current list of Plugins here: https://github.com/Sharrnah/whispering/blob/main/documentation/plugins.md

Johell1NS commented 6 months ago

HI, I'm trying to do what you told me but it's not working. I installed the CoquiTTS plugin, and as a template I selected the one you indicated: But when I go to General, I don't find anything in the language menu: In fact, then going into the Text-To-Speech function which is the one I was interested in, I don't see the plugin model: Can you tell me how I can solve it? A thousand thanks, Alessio

Sharrnah commented 6 months ago

Hi.

Make sure to check under "Advanced" -> "Logs" if it maybe still downloads the AI model. the language field is most likely only filled when it finished downloading and loading the AI model.

grafik

Also make sure to disable the integrated Text-to-Speech when starting with the profile. grafik

TTS Plugins should disable the integrated TTS automatically, but when i tested it yesterday, it didn't for me so i might have a bug in that plugin.

The Model entries in the Text-to-Speech tab you show are still from the Silero TTS.

The Model Dropdown in the Speech-to-Text tab is not used by the Coqui Plugin. Only the Voices Dropdown for some of the models (like the XTTS v2 model)

I hope that helps.

//EDIT: Made a small fix for the Plugin, so it should now disable the integrated TTS and also correctly starting / stopping when the Plugin is activated or disabled. (now version 1.1.7)

Johell1NS commented 6 months ago

HI, In the log I see that the model has been downloaded, but I also see a runtime error, could this be why it doesn't work for me?

Sharrnah commented 6 months ago

Hi.

I checked where it fails. That error probably happens because you still have selected a Silero TTS voice.

XTTS v2. needs either one of the precreated voices or a voice sample you want to clone.

So if you have Silero disabled, you should see a list of the default voices you can choose in the Text-to-Speech tab: grafik

Or you select no voice: grafik

And instead use a wav file for voice cloning in the plugin settings and have voice_change_source set to "Text to Speech": grafik (No its not using RVC. that was just an audio file i had lying around. You can use the RVC Plugin together with the Plugin though to get better voice cloning results.)

Johell1NS commented 6 months ago

HI, I put the file for cloning the voice, but in Text-to-Speech, if I try to click on the model and voice menu, I see an empty list, I can't select anything. This has been happening since I installed the plugin, before I had lists.

Sharrnah commented 6 months ago

Can you update the Plugin to version 1.1.7

And then under Model press Load Model ?

This should restart the coqui part and should also fill the language select box in the plugin when it finished loading. grafik

If that still doesn't help, you might try installing espeak-ng. Some TTS models in coqui TTS require this. (Though i couldn't find that XTTSv2 needs it).

https://github.com/espeak-ng/espeak-ng/releases

Also can you maybe try a different TTS model in the Plugin? For example tts_models/it/mai_male/vits? You might have to set sample_rate_overwrite to 16000 for that model. (I think i missed that one that it uses a lower sample rate than most other models.) grafik

And sorry for the issues. If we find the issue i hope i can make it easier in the future. It seems Italian is quite rare in TTS. I checked the fairseq models who support over 1000 languages but couldn't find italian in the list. 👀

Johell1NS commented 6 months ago

I updated the plugin to version 1.1.7, then I tried to reload the model as you instructed but nothing changed. I also installed espeak-ng, but again nothing new. Lastly, I tried to change the model by putting tts_models/it/mai_male/vits with sample_rate_overwrite on 16000, but again it didn't work. The "language" menu under General is always empty.

Sharrnah commented 5 months ago

The `tts_models/it/mai_male/vits´ model is single language, so it doesn't fill the language menu under General.

But if even that model doesn't work, i guess there is something wrong in general with the TTS Backend. So maybe something went wrong on downloading / extracting it?

Can you try and delete Plugins\coqui_tts_plugin\coqui-tts and let the Plugin redownload the dependency?

And it would be really nice to see the Log if something fails. Otherwise its hard to guess whats exactly wrong, unless its the exact same error as previously.

--

Also depending on what you intend to use it for, you can still try the Bark Plugin which supports Italian. Or if you are okay with using a cloud based TTS, you can give the Elevenlabs Plugin a try. Though it costs some money if using more than 10.000 characters per month.

If you know of any other TTS that supports Italian i would be happy to have a look at it and if i can implement it. I am really surprised that its so uncommon it seems.

Johell1NS commented 5 months ago

I deleted Plugins\coqui_tts_plugin\coqui-tts and let the plugin download the dependency again, but the problem didn't go away. I'm attaching the log file, but I don't understand why it's written in Japanese/Chinese (or I don't know which oriental language), I don't know if you understand anything about it, but I'm attaching it anyway. log.txt

Sharrnah commented 5 months ago

Thank you @Johell1NS

It didn't directly point to the issue, but at least i think i can say that the errors in the log point to the Coqui process not correctly starting / stopping.

In combination to an earlier error my current idea is that maybe the downloaded models are not loadable which crashes the process.

It would be nice if you could check if you have the folder Plugins\coqui_tts_plugin\tts\tts_models--multilingual--multi-dataset--xtts_v2 and it has 5 files and all is 1,74 GB big?

since the pth files are basically just zip files containing the model data, and an earlier error pointed to pyTorch not able to read a zip file, i suspect that some files are corrupted.

You can try deleting the above mentioned folder (Or other model folders in the Plugins\coqui_tts_plugin\tts\ Path) And let it redownload. You can check the download in the Log if you do it.

(I am working on some better way to show it outside the Log)

See also this issue i found about this possible issue: https://github.com/coqui-ai/TTS/issues/3605

Johell1NS commented 5 months ago

HI, By reinstalling the entire plugin I managed to get it working, even if it doesn't seem very stable (I'm trying the xtts_v2 model). It accepts a short string, otherwise it gives a warning, but even with short strings, when I do the first play it seems to go well, then if I repeat it, perhaps changing my voice, it starts to say nonsense words. Basically every time I want to make a new play I have to restart the Beckend. Another error I found is that it pronounces punctuation marks

Sharrnah commented 5 months ago

Hi. nice that you got it working.

It should split the text into chunks for the TTS model to handle, so not sure what warning it would display. Can you send the warning here you are experiencing?
About the nonsense words. Are you using it together with the Speech2Text? Could be that the Speech2Text is catching some environment noise which often results in hallucinations. You can increase the VAD Speech confidence and Speech volume Level to compensate for that. (Optionally with the AI Denoise option, though that does not influence the recording trigger.)
The issue with punctuation marks is a known issue with XTTS which i also added some findings here: https://github.com/coqui-ai/TTS/issues/3598 though it seems every language is a bit differently here. German is even adding more than just punctuations. So maybe if thats the only issue with italian, i could add a filter to remove punctuations before sending it to the TTS model. But i might have to readd the sentence splitting myself then since it can't split at sentence boundaries anymore.

Johell1NS commented 5 months ago

HI, I disabled Speech2Text and actually no longer had hallucinations.

As for the warning, I'll give you the screenshoot because I can't copy and paste from the log. P.S. but why do I see the log file in Japanese or oriental characters?

As for punctuation, from my tests only the period (.) confuses it, I'm replacing it with the semicolon (;). In my opinion, punctuation marks cannot be excluded, because they contribute to giving the right rhythm that we desire to the pronunciation of the sentence.

Sharrnah commented 5 months ago

I played a bit with it yesterday too and had that warning. I guess either Coqui has no sentence splitting integrated or it has but still warns. (since sometimes it can be troublesome to split at the right places).

I could implement a custom sentence splitting if that is really making trouble, but i had no issue generating a rather long text.

Because of the punctuation: If it has no issue with other types of punctuations, i could add a replace function for these. But i agree that they are important for pronounciation.

I don't know why you see some japanese characters. Could be that it is confused about some special characters i also saw in the log file you posted. I wouldn't worry too much about it. :)

--

I also worked and improved the sentence splitting for Bark yesterday. I have a Beta Version of the Bark plugin with it integrated. Official release would be with the next Whispering Tiger release, or i can post it here (Or you can find it in the Discord Server) in case you are interested in trying that out.

Johell1NS commented 5 months ago

Thanks so much for the support!

Sharrnah / whispering-ui

Text-to-Speech in Italian? #22