KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.09k stars 190 forks source link

Docker is not working #110

Open IlyaShkurenko opened 2 months ago

IlyaShkurenko commented 2 months ago

Spent whole day to try this run by different ways and with no one had success. Always some errors. Here is the docker error python3: can't open file '/app/example_browserclient/server.py': [Errno 2] No such file or directory

IlyaShkurenko commented 2 months ago

Also receive this error once I said something Say something... Traceback (most recent call last): File "/Users/illiashkurenko/.pyenv/versions/3.10.11/lib/python3.10/site-packages/RealtimeSTT/tests/simple_test.py", line 6, in while (True): print(recorder.text(), end=" ", flush=True) File "/Users/illiashkurenko/.pyenv/versions/3.10.11/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 893, in text return self.transcribe() File "/Users/illiashkurenko/.pyenv/versions/3.10.11/lib/python3.10/site-packages/RealtimeSTT/audio_recorder.py", line 844, in transcribe status, result = self.parent_transcription_pipe.recv() File "/Users/illiashkurenko/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/Users/illiashkurenko/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/Users/illiashkurenko/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

KoljaB commented 2 months ago

Docker support was a PR, I have no clue about Linux. The EOFError seems to hint to the child process responsible for transcription might have terminated unexpectedly, closing the pipe before the parent process could read from it. I'd test faster_whisper installation separately first. Might create an exception in the transcribe method.

IlyaShkurenko commented 2 months ago

so how I can run it or from what this error depends on since someone don't have it?

KoljaB commented 2 months ago

The EOFError on this recv() call means the pipe was closed from the other end before sending back a response. The most probable reason is that the the transcription worker process is terminating unexpectedly before it can send a response, might be an unhandled exception causing it to exit before responding. Because of this I asked to test faster_whisper seperately. I assume faster_whisper does not run correctly and causes the thread to exit. Notice if faster_whisper fails we don't see logging here because transcription happens in the new process and it would print to it's invisible stdout. Might route that back to main stdout soon but that also can potentially introduce new problems.

Another possible reasons for EOFError (that I think are rather unlikely) are problems with the multiprocessing setup, causing the pipe to close prematurely. Or that the shutdown_event might be getting set somehow unexpectedly, causing the worker to exit its main loop.

You find infos about how to test faster whisper in their repo.

IlyaShkurenko commented 2 months ago

I was able to run It in cloud but anyway thanks for help.

Also do the increasing of resources like GPU, CPU, RAM affects?

I had 2 GPU, 8vCPU(4 core) 30GB memory then I changed to 4 GPU, 16vCPU(8 code) 60 GB memory and I didn't receive any changes in performance. fullSentence using large-v2 is coming after 2-3 seconds for a phrase with 20-25 words.

Is there a way how I can speed up it to 0.5-1 second?

P.S. Thanks for a good job!

KoljaB commented 2 months ago

The number of GPUs should not have any affect. The one thing that really speeds it up is CUDA being available for the torch installation (you can check with torch.cuda.is_available()). In this case transcription time even with large-v2 should be WAY below 1 second for the small chunks we provide here.