KoljaB / RealtimeTTS

Converts text to speech in realtime
1.69k stars 152 forks source link

Support multiple langaues per engine #116

Closed she7ata7 closed 1 month ago

she7ata7 commented 1 month ago

Can you make the same module support more than one language at the same time? For example: (GTTS engine) I can only pass language not a list of languages

def __init__(self, 
                 language: str = 'en',
                 tld: str = 'com',
                 chunk_length: int = 100,
                 crossfade_length: int = 10,
                 speed_increase: float = 1.0):
        self.language = language
        self.tld = tld
        self.chunk_length = chunk_length
        self.crossfade_length = crossfade_length
        self.speed_increase = speed_increase
KoljaB commented 1 month ago

Gtts does not support this:

https://gtts.readthedocs.io/en/latest/module.html#localized-accents

she7ata7 commented 1 month ago

which engine does support this feature?

KoljaB commented 1 month ago

OpenAI has it per default, Elevenlabs when using multilingual models and Azure when using one of the multilingual voices (RyanMultilingualNeural etc). Coqui, gtts and system engines can't do that afaik.

she7ata7 commented 1 month ago

thanks but I tried to use German but I got these errors Do you now why?

RealTimeSTT: faster_whisper - WARNING - The current model is English-only but the language parameter is set to 'de'; using 'en' instead.
WARNING:faster_whisper:The current model is English-only but the language parameter is set to 'de'; using 'en' instead.
recorder_config = {
                'use_microphone': False,
                'spinner': False,
                'model': 'large-v2',
                'language': "de",
                'silero_sensitivity': 0.4,
                'webrtc_sensitivity': 2,
                'post_speech_silence_duration': 0.4,
                'min_length_of_recording': 0,
                'min_gap_between_recordings': 0,
                'enable_realtime_transcription': True,
                'realtime_processing_pause': 0.2,
                'realtime_model_type': 'tiny.en',
            }
KoljaB commented 1 month ago

This is your RealtimeSTT input configuration. I thought we talk about TTS. Your problem is 'realtime_model_type': 'tiny.en' Both models must support german. Use tiny without the ".en" oder small or medium. Wir können gern auch deutsch reden btw.

she7ata7 commented 1 month ago

Sorry about the confusion :confused: , this is the TTS module method. (cuz I'm using both TTS(GTTS-Engine) and STT) self.stream.play(output_wavfile = file_path, muted= True, language = "de")

For STT:

recorder_config = {
                'use_microphone': False,
                'spinner': False,
                'model': 'large-v2',
                'language': "de",
                'silero_sensitivity': 0.4,
                'webrtc_sensitivity': 2,
                'post_speech_silence_duration': 0.4,
                'min_length_of_recording': 0,
                'min_gap_between_recordings': 0,
                'enable_realtime_transcription': True,
                'realtime_processing_pause': 0.2,
                'realtime_model_type': 'tiny',
            }

But there is a problem the generated WAV file voice is different from the German language! Am I missing something?

KoljaB commented 1 month ago

Use language="de" on GTTSEngine constructor. The language parameter for play methods only affect sentence splitting.

she7ata7 commented 1 month ago

Sometimes I get this error but I don't know why?

This happens only when use de instead of en

Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/legionrtx/Desktop/ai_assistant/sip/AudioMediaPort.py", line 65, in recorder_thread
    stt_sentence = self.recorder.text()
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 894, in text
    return self.transcribe()
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 845, in transcribe
    status, result = self.parent_transcription_pipe.recv()
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
EOFError: Ran out of input
Exception in thread Thread-54 (recorder_thread):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/legionrtx/Desktop/ai_assistant/sip/AudioMediaPort.py", line 65, in recorder_thread
    stt_sentence = self.recorder.text()
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 894, in text
    return self.transcribe()
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 845, in transcribe
    status, result = self.parent_transcription_pipe.recv()
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
_pickle.UnpicklingError: invalid load key, '\x00'.
^Z

This is the code;

def recorder_thread(self):
        #global recorder
        SipEndpoint.ep.libRegisterThread("recorder_thread")
        #recorder = STT_Voice_Singleton().instance.recorder
        self.recorder_ready.set()
        while True:
            #try:

                stt_sentence = self.recorder.text()
                print(f"\r STT_sentence: {stt_sentence}")

                gpt_response_chunks = split_text_chunks(self.gpt.chat_completion(stt_sentence))
                gpt_start_time = time.time()
                print(f"\r GPT_response: {gpt_response_chunks}")
                gpt_end_time = time.time()
                latency = gpt_end_time - gpt_start_time
                print(f"Latency of GPT is: {latency} seconds")

                # convert text chunks to WAV files
                text_queue = Queue()
                for text in gpt_response_chunks:
                    text_queue.put(text)

                while not text_queue.empty():
                    # Convert text to wav
                    text = text_queue.get()
                    wav_file = self.tts.command_to_wav(command = text, file_path=f"./auto_generated/TTS_GTTSEngine_{int(time.time() * 1000)}.wav")
                    self.current_playing_wav = wav_file
                    self.player = pj.AudioMediaPlayer()
                    #self.player = CustomAudioMediaPlayer(on_eof_callback=lambda: self.reset_current_wav)
                    self.player.createPlayer(wav_file, pj.PJMEDIA_FILE_NO_LOOP)
                    aud_med = self.current_call.getAudioMedia(-1)
                    self.player.startTransmit(aud_med)

                    time.sleep(0.1)
                    while self.player.getPos() != 0:
                        ci = self.current_call.getInfo()
                        if ci.state == pj.PJSIP_INV_STATE_DISCONNECTED:
                            break
                        print("getPos:: " + str(self.player.getPos()))
                        time.sleep(0.1)

            #except Exception as e:
            #        print(f"An error occurred: break the main loooop {e}")
            #        self.tts.shutdown()
            #        print(f"shutdown -> TTS")
            #        #recorder.shutdown()
            #        #print(f"shutdown -> STT")
            #        break
she7ata7 commented 1 month ago

For TTS:

self.voice = GTTSVoice(speed_increase=voice_speed_increase, language = MODEL_LANGUAGE)
 self.engine = GTTSEngine(self.voice)
self.stream = TextToAudioStream(self.engine)

For STT:

recorder_config = {
                'use_microphone': False,
                'spinner': False,
                'model': 'large-v2',
                'language': MODEL_LANGUAGE,
                'silero_sensitivity': 0.4,
                'webrtc_sensitivity': 2,
                'post_speech_silence_duration': 0.4,
                'min_length_of_recording': 0,
                'min_gap_between_recordings': 0,
                'enable_realtime_transcription': True,
                'realtime_processing_pause': 0.2,
                'realtime_model_type': 'tiny',
            }
KoljaB commented 1 month ago

There is not much code in recorder_thread that is related to my libs, so hard to say. In my experience most cases where I had issues with pickle and got UnpicklingErrors I could track it down to some bugs in the surrounding code (and not a bug in the lib itself). "Sometimes I get this error" also is really hard to track down, it's easier when there's a workflow which can reproduce it reliably.

she7ata7 commented 1 month ago

so what about this issue? Do you have any idea about it?

WARNING:root:engine coqui failed to synthesize sentence "yes you are right" with error: [Errno 32] Broken pipe
RealTimeSTT: root - WARNING - engine coqui failed to synthesize sentence "yes you are right" with error: [Errno 32] Broken pipe
Traceback: Traceback (most recent call last):
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeTTS/text_to_stream.py", line 343, in synthesize_worker
    success = self.engine.synthesize(sentence)
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 782, in synthesize
    self.send_command('synthesize', data)
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 653, in send_command
    self.parent_synthesize_pipe.send(message)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
she7ata7 commented 1 month ago

I tried this solution The quickest fix is to go back to an older transformers version: pip install transformers==4.38.2 https://github.com/KoljaB/RealtimeTTS/issues/85#issuecomment-2121210420 But still got the same issue

KoljaB commented 1 month ago

Same thing. BrokenPipeError and pickling errors are in most cases problems of the surrounding code. Especially if the unchanged example codes work. I think you should check all your other code. Insert lots of prints and try/error etc. to find where the problem is. If you really think it's RealtimeTTS then please try to reduce it to a small bit of working code which mostly only uses RealtimeTTS and post this full code which can reproduce the problem. Then I will look into it. But as I said, it is in 99% of the cases NOT the RealtimeTTS library itself that causes this problem. In 99% of cases the broken pipe just reacts to a process not responding because some other kind of error occured there.

she7ata7 commented 1 month ago

I think it's a threading issue because I launch STT in different thread and in the same thread I try to generate a wav using the coquie engine and I got these error (previous ones) but when I use the GTTS (It works fine).

recorder_thread = threading.Thread(target=self.recorder_thread)
recorder_thread.start()
self.recorder_ready.wait()
def recorder_thread(self):
        global recorder
        print("Initializing RealtimeSTT...")
        recorder = AudioToTextRecorder(**self.recorder_config)
        print("RealtimeSTT initialized")
        self.recorder_ready.set()
        while True:
            stt_sentence = recorder.text()
            # This Line throws the exception of threading issue (below)
            tts_response_wav = self.tts.command_to_wav(command="yes you are right", 

Coqui Engine class:

class TTS_CoquiEngine:

    def __init__(self):
        self.engine = CoquiEngine()
        self.stream = TextToAudioStream(self.engine)

    def __dummy_generator(self):
        yield "Hey guys! These here are realtime spoken sentences based on local text synthesis."

    def play_stream(self):
        # Record the start time
        start_time = time.time()

        print("Starting to play stream")
        self.stream.feed(self.__dummy_generator()).play(log_synthesized_text=True)

    def command_to_stream(self, text, on_audio_chunk_callback):
        self.stream.feed(text)
        self.stream.play(on_audio_chunk=on_audio_chunk_callback, muted=True)

    def command_to_wav(self, command, file_path):
        self.stream.feed(command)
        self.stream.play(output_wavfile = file_path, muted= True)
        return file_path

    def shutdown(self):
        self.engine.shutdown()

The exception:

RealTimeSTT: root - WARNING - engine coqui failed to synthesize sentence "yes you are right" with error: [Errno 32] Broken pipe
Traceback: Traceback (most recent call last):
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeTTS/text_to_stream.py", line 343, in synthesize_worker
    success = self.engine.synthesize(sentence)
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 782, in synthesize
    self.send_command('synthesize', data)
  File "/home/legionrtx/.local/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 653, in send_command
    self.parent_synthesize_pipe.send(message)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
KoljaB commented 1 month ago

If you replace that line: tts_response_wav = self.tts.command_to_wav(command="yes you are right", .. with a time.sleep(10) to block that thread, does it then happen too?

KoljaB commented 1 month ago

Maybe the generation pulls so many resources that RealtimeSTT somehow fails. But there shouldn't be much processing at that point after text call.

she7ata7 commented 1 month ago

If you replace that line: tts_response_wav = self.tts.command_to_wav(command="yes you are right", .. with a time.sleep(10) to block that thread, does it then happen too?

I tried this and I got No error so why does this happen only with coqui-engine?

KoljaB commented 1 month ago

CoquiEngine is the only one using pickles for multiprocessing. I know this does not explain the error itself.

she7ata7 commented 1 month ago

Do you have any other ideas about what to do? I also tried this solution to make the Coquie Engine to be called in the main thread but still the same errors :disappointed: https://stackoverflow.com/questions/18989446/execute-python-function-in-main-thread-from-call-in-dummy-thread

KoljaB commented 1 month ago

You can send me the whole code file you are using to my email adress kolja.beigel@web.de and I can try to reproduce. Or even better, you try to break it down to the least possible code needed to reproduce it and send me that. Then I look into that when I find time (might take some days since this costs me time).

she7ata7 commented 1 month ago

I want to say, thanks so much for offering help (I just migrated to use Coqui TTS) :smiley: