KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.09k stars 190 forks source link

Real-Time transcribtion works poorly #108

Open andreykuzovlevv opened 2 months ago

andreykuzovlevv commented 2 months ago

I took the code you provided in tests/realtimestt_test.py and it's working way worse than your showcase:

https://github.com/user-attachments/assets/2be98186-8442-4205-b59a-0713dd8e127e

KoljaB commented 2 months ago

Some things you could try:

  1. Set 'language': 'en' for fixed english language in the in the realtimestt_test.py to get rid auf the automatic translation bug
  2. Set 'silero_deactivity_detection': False in the realtimestt_test.py
  3. To speedup your transcription make sure you have CUDA installed (for example for cuda 12.1 install it with pip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121 )
  4. This sudden russian language detection with the big large-v2 model is very strange and hints to something being wrong. Pls verify that your CUDA and torch versions meet faster_whisper requirements. If not, pls change to safe versions (for example 12.1 for CUDA and 2.3.1 or 2.2.2 or 2.1.2 for torch). Maybe check numpy too (sometimes numpy>2.0.0 causes problems on some systems, use for example numpy==1.23.5 instead). Could also be transformers version, not sure about that. Or something with CTranslate2.
  5. Implement this faster_whisper fix that is not yet available in a release in your local faster_whisper/transcribe.py if you need automatic language detection and want to get rid auf the automatic translation bug without setting fixed language
  6. Try installing RealtimeSTT in a fresh virtual environment, not in your main python environment