KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
MIT License
2.09k stars 190 forks source link

RealtimeSTT on AMD Guide #107

Open TheTrustedComputer opened 2 months ago

TheTrustedComputer commented 2 months ago

Below is a guide to running RealtimeSTT on AMD GPUs. Most of the time, building/replacing PyTorch and ONNX Runtime with their ROCm versions will work. However, this will not be enough as CTranslate2 also needs to be rebuilt for ROCm. Unfortunately, it is not officially supported, but someone has forked it to support these cards: https://github.com/arlo-phoenix/CTranslate2-rocm

Follow the build steps from the link above. You can optionally disable OpenMP and use another BLAS library like OpenBLAS. Then, install the ROCm build of CTranslate2 with pip and test RealtimeSTT. On my 8GB 5500 XT, it seemed to function but is really unusable; I got loads of out-of-memory errors, even on the tiny model with the beam search size set to 1.

2024-08-25 07:44:44,201 root [ERROR] - Unhandled exeption in _realtime_worker: CUDA failed with error out of memory
RealTimeSTT: root - ERROR - Unhandled exeption in _realtime_worker: CUDA failed with error out of memory
Exception in thread Thread-5 (_realtime_worker):
Traceback (most recent call last):
  File "/home/thetrustedcontainer/.python-3.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/home/thetrustedcontainer/.python-3.11/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/home/thetrustedcontainer/software/.RealtimeSTT-venv/lib/python3.11/site-packages/RealtimeSTT/audio_recorder.py", line 1496, in _realtime_worker
    segments, info = self.realtime_model_type.transcribe(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/thetrustedcontainer/software/.RealtimeSTT-venv/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 397, in transcribe
    encoder_output = self.encode(segment)
                     ^^^^^^^^^^^^^^^^^^^^
  File "/home/thetrustedcontainer/software/.RealtimeSTT-venv/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 838, in encode
    return self.model.encode(features, to_cpu=to_cpu)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA failed with error out of memory

I had to stick with the CPU version that is slower yet sufficient for my use case. Nevertheless, I hope this guide will help other users with AMD GPUs get RealtimeSTT running on their cards. As with any unsupported hardware, your mileage may vary.

Related: https://github.com/KoljaB/RealtimeSTT/issues/7#issuecomment-2028047137