Closed jowparks closed 2 weeks ago
Some questions:
import torch
print(torch.cuda.is_available())
what does it say?)
MacOS MBP M1
The command gives me "False".
Thank you for reporting this, I don't own a Mac, hope you to have a bit patience with me to solve that. The client/server code is still very new and not well tested.
Since the command gives you "False" the server is not using GPU for transcription. That's normal for a Mac. Therefore the server would possibly react a bit slow, but it still should submit some kind of realtime transcription within one or two seconds and I'm currently unsure why that does not seem to happen at all.
I will need to add more logging to both server and client, especially an option for the server to log its chunk handling in detail. Will release a new version soon. I'll give both client and server an option to save both the recorded and received audiochunks to a file, so we can see if the audio is correctly recorded and transmitted.
I think with this we then should have a good toolset to inspect that further and hopefully find out what goes wrong.
Sounds good I am happy to help once logging is in place. I am surprised GPU isn't enabled on MacOS, I was reading and it seems torch
supports GPU acceleration on macos now?
There's MPS support which is similar to CUDA and not every CUDA operation is implemented there I think. I don't know if faster whisper supports that. Currently I don't call torch with mps in RealtimeSTT but I could open up the opportunity to do so, so we can see if it works. It's torch.backends.mps.is_available() on Mac btw not torch.cuda.is_available() to check if torch has GPU support here. Take everything I say here with a grain of salt please, I have very limited knowledge here as I have no Mac.
Thanks yea you are right, torch.backends.mps.is_available()
gives True
on MBP
New version now has additional logging:
Additional parameters for server: --use_extended_logging, writes extensive log messages for the recording worker, that processes the audio chunks --debug, enables debug logging for detailed server operations --logchunks, enables logging of incoming audio chunks (periods) --writechunks, saves received audio chunks to a WAV file Additional parameters for client: --debug, enables debug logging for detailed client operations --writechunks, saves recorded audio chunks to a WAV file
What I'd like to know now, if you start the server with "stt-server --debug --use_extended_logging --writechunks serverchunks.wav" and then start the client and start to talk:
Not sure if something else changed in this version, but I tested and it is kind of working now with default params. All those log lines appear like you suggest and the audio that is coming through to the server in the output file seems correct.
The first sentence I say on start of the client works, then the client shortly after errors out:
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Received data message: {"type": "fullSentence", "text": "Should fail soon."}
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Message type: fullSentence
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Full sentence received: Should fail soon.
Should fail soon.
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Stopping client and cleaning up resources.
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] WebSocket connection closed
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] Close status code: None
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] Close message: None
[DEBUG][2024-11-02 15:30:44][Thread-1 (run_forever)] WebSocket object: <websocket._app.WebSocketApp object at 0x14bbf0c50>
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Error processing data message: cannot join current thread
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] WebSocket connection closed
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Close status code: None
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] Close message: None
[DEBUG][2024-11-02 15:30:44][Thread-2 (run_forever)] WebSocket object: <websocket._app.WebSocketApp object at 0x14bbf1550>
[DEBUG][2024-11-02 15:30:44][Thread-4 (command_processor)] Command processor thread stopping
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Sending audio chunk 282: 2048 bytes, metadata: {"sampleRate": 16000}
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Error sending audio data: Connection is already closed.
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Cleaning up audio resources
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Stopping and closing audio stream
[DEBUG][2024-11-02 15:30:44][MainThread] Stopping client and cleaning up resources.
[DEBUG][2024-11-02 15:30:44][Thread-3 (record_and_send_audio)] Terminating PyAudio interface
Oh. You are right, I can reproduce that. I made an additional change that probably introduced this, will fix it. The client is supposed to only transcribe a single sentence. This said I realize it's probably best to add an option to make it continuously transcribe too.
Trying to use default server/client packaged with
RealtimeSTT
and it doesn't seem to work.I ran
pip install RealTimeSTT
then in one shell:
stt-server
(setup took a bit)After successfully running server, in a different shell ran
stt
It starts sending audio, but no transcription occurs: Log from serverFrom Client
Tried with different microphones, seems to be sending the data, but transcription isn't occurring. All default settings.