KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.1k stars 191 forks source link

macOS #7

Closed hanxirui closed 1 week ago

hanxirui commented 1 year ago

Code:

import ssl

ssl._create_default_https_context = ssl._create_unverified_context
import torch

model, _ = torch.hub.load(repo_or_dir="snakers4/silero-vad", model="silero_vad", verbose=True)

if __name__ == '__main__':
    recorder = AudioToTextRecorder(spinner=False)

    print("Say something...")
    while (True):
        print(recorder.text(), end=" ", flush=True)

Error:


RealTimeSTT: root - ERROR - Unhandled exeption in _recording_worker: 
Exception in thread Thread-1 (_recording_worker):
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/hanxirui/workspace/python/DataScience/venv/lib/python3.11/site-packages/RealtimeSTT/audio_recorder.py", line 667, in _recording_worker
    while self.audio_queue.qsize() > self.allowed_latency_limit:
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 126, in qsize
    return self._maxsize - self._sem._semlock._get_value()
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError```
KoljaB commented 1 year ago

Damn. Researched it and yes, MacOS does not support queue.qsize() method from the multiprocessing module.

Need to find a workaround for this. Sorry for this issue.

KoljaB commented 1 year ago

Updated audio_recorder.py to a new version which hopefully fixes this (not available with pip install yet). Would be great to hear feedback, if that works.

KoljaB commented 1 year ago

Fix now available also with pip install (untested though, unfortunately I have no Mac):

pip install --upgrade realtimestt==0.1.7
eelxpeng commented 1 year ago

Any idea this error?

Say something...
RealTimeSTT: root - WARNING - Audio queue size exceeds latency limit. Current size: 84. Discarding old audio chunks.
zsh: segmentation fault  PYTHONPATH=. python tests/simple_test.py
Process Process-2:
Traceback (most recent call last):                                                                                                                                           
  File "/Users/xiaopel/opt/anaconda3/envs/torch2/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/xiaopel/opt/anaconda3/envs/torch2/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/xiaopel/Github/Startup/RealtimeSTT/RealtimeSTT/audio_recorder.py", line 369, in _transcription_worker
    audio, language = conn.recv()
  File "/Users/xiaopel/opt/anaconda3/envs/torch2/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/Users/xiaopel/opt/anaconda3/envs/torch2/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/Users/xiaopel/opt/anaconda3/envs/torch2/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
KoljaB commented 1 year ago

I'm sorry, I can't really tell what's going wrong here. After bit of research it seems Mac does have some issues with pythons multiprocessing. Maybe it's worth a try with a newer python version, python 3.9 is already two years old.

eelxpeng commented 1 year ago

thanks @KoljaB . I get around this by use a transcribe function, instead of sending data to transcribe via Pipe. From the design, it seems that there is no need for starting the process of transcription_worker. Anything I missed?

KoljaB commented 1 year ago

If you run recorder.text() in a loop the transcription of the last sentence pulls so much resources, that the voice activity detection is not reliable in the time the transcription runs. This is a problem if the transcription needs some time (long last sentence) and the next sentence is very short (depends on VAD). Then the short next sentence would not be detected. So basically it is a fix for a quite specialized problem. I did not realize that multiprocessing would introduce that many new problems as it did, especially for non-windows platforms.

abhishek-tg commented 11 months ago

@eelxpeng what changes did you make in the audio_recorder.py, when replacing transcription_worker with transcribe, can you post the diff?

KoljaB commented 11 months ago

Maybe this checkpoint helps, that was before introducing multiprocessing.

abhishek-tg commented 11 months ago

Thanks but that causes the https://github.com/KoljaB/RealtimeSTT/issues/3 issue with stream closed, also torch with faster-whisper is already an issue: https://github.com/SYSTRAN/faster-whisper/issues/137

ekimia commented 9 months ago

Still the same issue - tried on python 3.11 and 3.12. Will take a look later.

astuteprogrammer commented 9 months ago

Great work. KoljaB! I found the fix after breaking my head for a while in macOSX. replace the multiprocessing Queue with Manager.Queue. It works perfectly.

I don't want the wake words and other aspects. So i had to strip out certain aspects of the code. Still it serves my purpose.

Another thing I noticed was the device index. It works without passing, and in my case the mic was on the device 1. took me a while to list channels & identify the right value.

from multiprocessing import Manager

manager = Manager()
queue = manager.Queue()

# ... use the queue ...

if queue.qsize() > 0:  # Check for elements
    print("Queue has elements.")
KoljaB commented 9 months ago

Thanks a lot for this hint. I recently switched RealtimeSTT (and RealtimeTTS) from multiprocessing to torch.multiprocessing. Is the problem still existing with v1.9.0? (I hope to be lucky and the switch to torch.multiprocessing does the same for macOSX). For the device_index I prob need to add an option to list the devices.

fronx commented 8 months ago

Unfortunately the switch to torch.multiprocessing reintroduced the qsize issue on macOS.

ehartford commented 8 months ago

is this using pytorch MPS acceleration?

KoljaB commented 8 months ago

Unfortunately the switch to torch.multiprocessing reintroduced the qsize issue on macOS.

I'll make a fix for this.

is this using pytorch MPS acceleration?

RealtimeSTT depends on the faster-whisper library, which in turn uses CTranslate2. This issue discussion from faster-whisper gh repo says there's no built-in support for AMD, MPS etc accelerations but it's possible to enable these backends by compiling CTranslate2 from the source with the desired backend before installing faster-whisper.

So - if I got this right - this would mean for MPS acceleration you would first compile CTranslate2 with the necessary backend support (MPS enabled). Then proceed with the installation of RealtimeSTT - which installs faster-whisper, but this should not override the manually compiled version of CTranslate2.

KoljaB commented 8 months ago

Unfortunately the switch to torch.multiprocessing reintroduced the qsize issue on macOS.

Should be fixed with v0.1.12 now.

fronx commented 8 months ago

Yay it's working! Thank you 🫶🏼

saurabh-ontoforce commented 2 months ago

Great work. KoljaB! I found the fix after breaking my head for a while in macOSX. replace the multiprocessing Queue with Manager.Queue. It works perfectly.

  • I had to settle on python 3.11 for faster-whisper and other dependencies to work

I don't want the wake words and other aspects. So i had to strip out certain aspects of the code. Still it serves my purpose.

Another thing I noticed was the device index. It works without passing, and in my case the mic was on the device 1. took me a while to list channels & identify the right value.

from multiprocessing import Manager

manager = Manager()
queue = manager.Queue()

# ... use the queue ...

if queue.qsize() > 0:  # Check for elements
    print("Queue has elements.")

Thank you, this Manager Queue works for me as well