KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.1k stars 191 forks source link

Launches but does not display any text #50

Open Denintendator opened 7 months ago

Denintendator commented 7 months ago

Just spinner with text "speak now". Tried large-v3, small and tiny models, no difference. Mic working well, I just tested it by recording audio using python. It's probably trying, because CPU is heavily loaded, but there's no result.

MacOS Sonoma 14.4

There are no errors in console, just warning: [2024-04-24 23:53:30.031] [ctranslate2] [thread 8069444] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

KoljaB commented 7 months ago

This warning can be ignored, must be some other issue. The thing is, sometimes some libraries I use (mostly WebRTCVad) don't throw errors (or log any other information) if they don't detect anything like voice activity or wakewords (also when something is wrong with the inputs), so hard to detect. Can't do much to verify too with realtime chunks too.

The pipeline is: Microphone recording -> PvPorcupine Wakeword detection -> WebRTCVad -> SileroVAD -> Whisper All last 4 can fail. WebRTCVad (and SileroVAD) often don't log much, WebRTCVad also sensible towards incoming chunks. PvPorcupine is rarely used and logs bit better. Whisper would tell you if something was off.

Please try this as first lines of your script:

import logging
logging.basicConfig(level=logging.DEBUG)

Then create AudioToTextRecorder object with recorder = AudioToTextRecorder(level=logging.DEBUG)

This sets highest loglevel for all used libs. Maybe you can then see anything in the logging why it fails (sometimes yes, sometimes not - if not you basically have to test every single lib for its own).

Denintendator commented 7 months ago

Thanks for reply. I tried and I got this:

RealTimeSTT: torio._extension.utils - DEBUG - Loading FFmpeg6
DEBUG:torio._extension.utils:Failed to load FFmpeg6 extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File ".../.venv/lib/python3.12/site-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(.../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so, 0x0006): Library not loaded: @rpath/libavutil.58.dylib
  Referenced from: <126DCB6D-A04F-381E-9269-B53E8E35F26A> .../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so
  Reason: tried: '/usr/lib/libavutil.58.dylib' (no such file, not in dyld cache)
RealTimeSTT: torio._extension.utils - DEBUG - Failed to load FFmpeg6 extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File ".../.venv/lib/python3.12/site-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(/.../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so, 0x0006): Library not loaded: @rpath/libavutil.58.dylib
  Referenced from: <126DCB6D-A04F-381E-9269-B53E8E35F26A> .../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg6.so
  Reason: tried: '/usr/lib/libavutil.58.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg5
RealTimeSTT: torio._extension.utils - DEBUG - Loading FFmpeg5
DEBUG:torio._extension.utils:Failed to load FFmpeg5 extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File ".../.venv/lib/python3.12/site-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(.../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg5.so, 0x0006): Library not loaded: @rpath/libavutil.57.dylib
  Referenced from: <20BD953E-E821-312C-A431-E77499CC9FF6> .../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg5.so
  Reason: tried: '/usr/lib/libavutil.57.dylib' (no such file, not in dyld cache)
RealTimeSTT: torio._extension.utils - DEBUG - Failed to load FFmpeg5 extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File ".../.venv/lib/python3.12/site-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(.../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg5.so, 0x0006): Library not loaded: @rpath/libavutil.57.dylib
  Referenced from: <20BD953E-E821-312C-A431-E77499CC9FF6> .../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg5.so
  Reason: tried: '/usr/lib/libavutil.57.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg4
RealTimeSTT: torio._extension.utils - DEBUG - Loading FFmpeg4
DEBUG:torio._extension.utils:Failed to load FFmpeg4 extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File ".../.venv/lib/python3.12/site-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(.../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg4.so, 0x0006): Library not loaded: @rpath/libavutil.56.dylib
  Referenced from: <F0621199-F06C-332D-AC1D-7E589CAC73E5> .../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg4.so
  Reason: tried: '/usr/lib/libavutil.56.dylib' (no such file, not in dyld cache)
RealTimeSTT: torio._extension.utils - DEBUG - Failed to load FFmpeg4 extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File ".../.venv/lib/python3.12/site-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: dlopen(.../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg4.so, 0x0006): Library not loaded: @rpath/libavutil.56.dylib
  Referenced from: <F0621199-F06C-332D-AC1D-7E589CAC73E5> .../.venv/lib/python3.12/site-packages/torio/lib/libtorio_ffmpeg4.so
  Reason: tried: '/usr/lib/libavutil.56.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg
RealTimeSTT: torio._extension.utils - DEBUG - Loading FFmpeg
DEBUG:torio._extension.utils:Failed to load FFmpeg extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 106, in _find_versionsed_ffmpeg_extension
    raise RuntimeError(f"FFmpeg{version} extension is not available.")
RuntimeError: FFmpeg extension is not available.
RealTimeSTT: torio._extension.utils - DEBUG - Failed to load FFmpeg extension.
Traceback (most recent call last):
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.12/site-packages/torio/_extension/utils.py", line 106, in _find_versionsed_ffmpeg_extension
    raise RuntimeError(f"FFmpeg{version} extension is not available.")
RuntimeError: FFmpeg extension is not available.

Seems like it's trying different versions of ffmpeg, but can't find the lib. And there is no any lib in "/usr/lib" related to ffmpeg, while ffmpeg is definitely installed, "ffmpeg -version" outputs:

built with Apple clang version 15.0.0 (clang-1500.3.9.4)
configuration: --prefix=/usr/local/Cellar/ffmpeg/7.0 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopenvino --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
libavutil      59.  8.100 / 59.  8.100
libavcodec     61.  3.100 / 61.  3.100
libavformat    61.  1.100 / 61.  1.100
libavdevice    61.  1.100 / 61.  1.100
libavfilter    10.  1.100 / 10.  1.100
libswscale      8.  1.100 /  8.  1.100
libswresample   5.  1.100 /  5.  1.100
libpostproc    58.  1.100 / 58.  1.100

Kind of weird problem. I didn't find anything like it on the Internet.

rajaiswal commented 6 months ago

@Denintendator running into the same issue, were you able to find a solution to this?

rihp commented 6 months ago

also found this issue, @rajaiswal did you get around fixing it?

running on a macbook M1 chip

cc @KoljaB

M4a1x commented 4 months ago

I tried looking into this issue a bit.

Based on my findings the errors are generated by torchaudio/torio. It is trying to load ffmpeg binaries from the system when it is imported by silero_vad. I would say this behaviour is expected if no ffmpeg libraries (not CLI) is installed on the system.

Since ffmpeg is only an optional dependency of silero_vad to load and save audio files and it is not needed for the normal function of it (as it gets loaded successfully - see output) I believe the error can safely be ignored.

In my case, the missing output came from an issue with my microphone. Apparently the indexes start at 1 for the device id parameter (not 0) and Windows can get quite confused by Bluetooth microphones. Playing around with the IDs and unmuting the microphone in the windows settings made it work for me.

Good luck!