KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.09k stars 190 forks source link

Default sst-server/sst client not working #141

Closed jowparks closed 2 weeks ago

jowparks commented 3 weeks ago

Trying to use default server/client packaged with RealtimeSTT and it doesn't seem to work.

I ran pip install RealTimeSTT

then in one shell: stt-server (setup took a bit)

After successfully running server, in a different shell ran stt It starts sending audio, but no transcription occurs: Log from server

RealtimeSTT initialized
VAD detection started
Server started. Press Ctrl+C to stop the server.
Control client connected
Data client connected
.....................................................................

From Client

Control WebSocket connection opened.
WebSocket connections established successfully.
Starting command processor
Data WebSocket connection opened.
Audio recording initialized successfully at 16000 Hz, device index 2
Recording and sending audio...
🔴

Tried with different microphones, seems to be sending the data, but transcription isn't occurring. All default settings.

KoljaB commented 3 weeks ago

Some questions:

jowparks commented 3 weeks ago

MacOS MBP M1

The command gives me "False".

KoljaB commented 3 weeks ago

Thank you for reporting this, I don't own a Mac, hope you to have a bit patience with me to solve that. The client/server code is still very new and not well tested.

Since the command gives you "False" the server is not using GPU for transcription. That's normal for a Mac. Therefore the server would possibly react a bit slow, but it still should submit some kind of realtime transcription within one or two seconds and I'm currently unsure why that does not seem to happen at all.

I will need to add more logging to both server and client, especially an option for the server to log its chunk handling in detail. Will release a new version soon. I'll give both client and server an option to save both the recorded and received audiochunks to a file, so we can see if the audio is correctly recorded and transmitted.

I think with this we then should have a good toolset to inspect that further and hopefully find out what goes wrong.

jowparks commented 3 weeks ago

Sounds good I am happy to help once logging is in place. I am surprised GPU isn't enabled on MacOS, I was reading and it seems torch supports GPU acceleration on macos now?

KoljaB commented 3 weeks ago

There's MPS support which is similar to CUDA and not every CUDA operation is implemented there I think. I don't know if faster whisper supports that. Currently I don't call torch with mps in RealtimeSTT but I could open up the opportunity to do so, so we can see if it works. It's torch.backends.mps.is_available() on Mac btw not torch.cuda.is_available() to check if torch has GPU support here. Take everything I say here with a grain of salt please, I have very limited knowledge here as I have no Mac.

jowparks commented 3 weeks ago

Thanks yea you are right, torch.backends.mps.is_available() gives True on MBP

KoljaB commented 2 weeks ago

New version now has additional logging:

Additional parameters for server: --use_extended_logging, writes extensive log messages for the recording worker, that processes the audio chunks --debug, enables debug logging for detailed server operations --logchunks, enables logging of incoming audio chunks (periods) --writechunks, saves received audio chunks to a WAV file Additional parameters for client: --debug, enables debug logging for detailed client operations --writechunks, saves recorded audio chunks to a WAV file

What I'd like to know now, if you start the server with "stt-server --debug --use_extended_logging --writechunks serverchunks.wav" and then start the client and start to talk:

jowparks commented 2 weeks ago

Not sure if something else changed in this version, but I tested and it is kind of working now with default params. All those log lines appear like you suggest and the audio that is coming through to the server in the output file seems correct.

The first sentence I say on start of the client works, then the client shortly after errors out:

[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Received data message: {"type": "fullSentence", "text": "Should fail soon."}
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Message type: fullSentence
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Full sentence received: Should fail soon.
Should fail soon.
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Stopping client and cleaning up resources.
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] WebSocket connection closed
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] Close status code: None
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] Close message: None
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] WebSocket object: <websocket._app.WebSocketApp object at 0x14bbf0c50>
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Error processing data message: cannot join current thread
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] WebSocket connection closed
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Close status code: None
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Close message: None
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] WebSocket object: <websocket._app.WebSocketApp object at 0x14bbf1550>
[DEBUG][2024-11-02 15:30:44][Thread-4 (command_processor)] Command processor thread stopping
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Sending audio chunk 282: 2048 bytes, metadata: {"sampleRate": 16000}
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Error sending audio data: Connection is already closed.
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Cleaning up audio resources
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Stopping and closing audio stream
[DEBUG][2024-11-02 15:30:44][MainThread] Stopping client and cleaning up resources.
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Terminating PyAudio interface
KoljaB commented 2 weeks ago

Oh. You are right, I can reproduce that. I made an additional change that probably introduced this, will fix it. The client is supposed to only transcribe a single sentence. This said I realize it's probably best to add an option to make it continuously transcribe too.

KoljaB commented 2 weeks ago

Should be fixed with v0.3.7 now. Some CLI parameter names changed, pls look into the Readme or call "stt -h" and "stt-server -h" to see the new options.