KoljaB / RealtimeTTS

Converts text to speech in realtime
1.59k stars 144 forks source link

pvorcupine use may be outdated #104

Open coding-garden-1 opened 1 month ago

coding-garden-1 commented 1 month ago

"ceback (most recent call last): File "/home/d/Documents/projects/voice girlification/RealtimeSTT/RealtimeSTT/audio_recorder.py", line 523, in init self.porcupine = pvporcupine.create( ^^^^^^^^^^^^^^^^^^^ TypeError: create() missing 1 required positional argument: 'access_key' Traceback (most recent call last): File "/home/d/Documents/projects/voice girlification/RealtimeSTT/test3.py", line 19, in recorder = AudioToTextRecorder( ^^^^^^^^^^^^^^^^^^^^ File "/home/d/Documents/projects/voice girlification/RealtimeSTT/RealtimeSTT/audio_recorder.py", line 523, in init self.porcupine = pvporcupine.create( ^^^^^^^^^^^^^^^^^^^ TypeError: create() missing 1 required positional argument: 'access_key'"

("test3.py" is simply the realtime_tts_loop thing modified to use elevenlabs engine)

All I want is to speak something, STT that, then TTS it out in a different voice

coding-garden-1 commented 1 month ago

was able to fix this issue by changing it to OWW, commenting out OWW Inference code, making OWW just equal Model() w/ no arguments. it had "speak now", i spoke, it transcribed, but then it just replied with the system prompt in a very robotic voice and had no output. i wish this was easier :sob:

coding-garden-1 commented 1 month ago

sorry, by "it had no output" i mean it froze at that point

coding-garden-1 commented 1 month ago

my bad -- the robotic voice was my system engine, and the "just repeating the sys con" was from a modification i accidentally made. the only issue im having now is that it can only talk one time, then its done. this is the code im using: import os from RealtimeTTS import TextToAudioStream, ElevenlabsEngine, SystemEngine from RealtimeSTT import AudioToTextRecorder

if name == 'main':

Text-to-Speech Stream Setup

stream = TextToAudioStream(
    SystemEngine(),
    log_characters=True
)

# Speech-to-Text Recorder Setup
recorder = AudioToTextRecorder(
    model="medium",
    language="en",
    wake_words="Jarvis",
    spinner=True,
    wake_word_activation_delay=5
)

def main():
    """Main loop for interaction."""
    while True:
        # Capture user input from microphone
        user_text = recorder.text().strip()

        if not user_text:
            continue

        print(f'>>> {user_text}\n<<< ', end="", flush=True)

        # Respond with the same user input sarcastically
        stream.feed(f"Oh, what an insightful statement, '{user_text}', master.").play()

if __name__ == "__main__":
    main()

. very uimpressed w ur code

coding-garden-1 commented 1 month ago

impressed*

KoljaB commented 1 month ago

Modified the code a bit to test it:

from RealtimeTTS import TextToAudioStream, ElevenlabsEngine, SystemEngine
from RealtimeSTT import AudioToTextRecorder

def main():
    """Main loop for interaction."""
    # Text-to-Speech Stream Setup
    stream = TextToAudioStream(
        SystemEngine(),
        log_characters=True
    )

    # Speech-to-Text Recorder Setup
    recorder = AudioToTextRecorder(
        model="medium",
        language="en",
        wake_words="Jarvis",
        spinner=True,
        wake_word_activation_delay=5
    )

    while True:
        # Capture user input from microphone
        print("Speak now")
        user_text = recorder.text().strip()

        if not user_text:
            continue

        print(f'>>> {user_text}\n<<< ', end="", flush=True)

        # Respond with the same user input sarcastically
        stream.feed(f"Oh, what an insightful statement, '{user_text}', master.").play()

if __name__ == "__main__":
    main()

Works multiple time in my environment:

(test_env) D:\Projekte\RealtimeSTT\tests_private>python 1test.py
Speak now
>>> Hey there, this is a test.
<<< Oh, what an insightful statement, 'Hey there, this is a test.', master.
Speak now
>>> Hey, what's going on?
<<< Oh, what an insightful statement, 'Hey, what's going on?', master.
Speak now

What is happening after speech is finished? Do you see the spinner output "- speak now" on the second time?

Btw your porcupine problem ("TypeError: create() missing 1 required positional argument: 'access_key'") probably is the result of updating porcupine to the latest version (where it needs subscription and therefore the access_key). If you use the version pvporcupine==1.9.5 from the RealtimeSTT requirements it should work.

coding-garden-1 commented 1 month ago

Okay, interesting. I was still having a lot of trouble with it (havent tried ur code yet) -- the pvporcupine thing makes sense. I will try that.

The "insightful comment, master" was OpenAI misunderstanding what I wanted with the code but i was too exhausted to fix it, lmfao. I just want it to directly go out into TTS. I'm going to try your code in a little bit to see if it works, keep you posted