KoljaB / RealtimeTTS

Converts text to speech in realtime
1.39k stars 119 forks source link

Adding Google Cloud Text To Speech Engine #97

Open sitatec opened 2 days ago

sitatec commented 2 days ago

Hi @KoljaB, thanks for this amazing repo. Great work 👍🏾, really!

I would like to know if I create an engine and implement the BaseEngine methods and simply generate the audio for every text given to the synthesize method, if it would work?

My goal is to get LLM output chunks and use Google Cloud TTS API (Not the G-Translate) to generate audio in real-time.

KoljaB commented 2 days ago

That should work. Actually not a bad idea to support Google Could TTS as an engine.

You need to put the audio chunks into self.queue like in all other engines. Make sure get_stream_info method returns the information needed about format, number of audio channels and sample rate. Look at gtts_engine.py and system_engine.py, these are quite simple engines that you can take as example.

Ask me if you run into probs. PR would be huge if it's finished ;)

sitatec commented 1 day ago

I will try to implement it and open a PR. I have one more question: I tried before generating 3 sentences separately in Google Cloud Console and download the audios, then merge them. But It was noticeable that they were merged, it wasn't as smooth as if I generated the 3 sentences together. Is your library handling this case to make is sound natural (not concatenated audios)?

KoljaB commented 1 day ago

That depends on what the reason is it does not sound natural. Everything you want to do make it sound "more natural" should be done in synthesize method. For CoquiEngine for example there is silence added between the detected sentences to make it sound more natural.