KoljaB / RealtimeTTS

Converts text to speech in realtime
1.41k stars 120 forks source link

Use coqui engine play_async Invalid output device error #41

Open jacobtang opened 4 months ago

jacobtang commented 4 months ago

Thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.

stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream) stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)

error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device) Traceback: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play self.player.start() File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start self.audio_stream.open_stream() File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream self.stream = self.pyaudio_instance.open( File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open stream = PyAudio.Stream(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init self._stream = pa.open(arguments) OSError: [Errno -9996] Invalid output device (no default output device)

KoljaB commented 4 months ago

Pyaudio can't open a output stream because it can't find a valid sound output device. So there seems to be an issue with the audio output device configuration. This should be a pyaudio/audioconfiguration issue, independent of RealtimeTTS, so the version used or the TTS Engine should not matter. I'm absolutely no linux expert, but maybe it has something to do with portaudio drivers?

jacobtang commented 4 months ago

Thanks a lot. I try to modify RealtimeTTS/stream_player.py, can avoid this problem.

self.stream = self.pyaudio_instance.open(format=pyFormat, channels=pyChannels, rate=pySampleRate, output=True)

Another question,use the coqui engine,the callback data size is 512,but openai engine the data size is 1024,how can I get the 1024 size callback chunk data?

KoljaB commented 4 months ago

You could just concatenate two subsequent chunks, then you have 1024 size.

stream_player.py splits the chunks as they come from the engines into smaller ones with a maximum size of 1024. This is done to enable immediate stopping of playback. If the chunks come in smaller, it does not accumulate though because it would add latency and induce the "what with the last chunk" problem. You would need to do that by yourself by concatenating.

jacobtang commented 4 months ago

I haven't tested the Coqui engine before. Does it need to run on a GPU environment to ensure the real-time performance of speech data? Currently, using callback chunk data as the input for audio data, the OpenAI engine can play normally, but the Coqui engine will have intermittent playback and data loss.

KoljaB commented 4 months ago

Yes, CoquiEngine needs GPU acceleration. It needs ~4 GB VRAM. Runs in realtime on my RTX 2080 but it is close - realtime factor is just a bit below 1. Will be faster with deepspeed enabled, but still need GPU.