Below is a guide to running RealtimeSTT on AMD GPUs. Most of the time, building/replacing PyTorch and ONNX Runtime with their ROCm versions will work. However, this will not be enough as CTranslate2 also needs to be rebuilt for ROCm. Unfortunately, it is not officially supported, but someone has forked it to support these cards: https://github.com/arlo-phoenix/CTranslate2-rocm
Follow the build steps from the link above. You can optionally disable OpenMP and use another BLAS library like OpenBLAS. Then, install the ROCm build of CTranslate2 with pip and test RealtimeSTT. On my 8GB 5500 XT, it seemed to function but is really unusable; I got loads of out-of-memory errors, even on the tiny model with the beam search size set to 1.
2024-08-25 07:44:44,201 root [ERROR] - Unhandled exeption in _realtime_worker: CUDA failed with error out of memory
RealTimeSTT: root - ERROR - Unhandled exeption in _realtime_worker: CUDA failed with error out of memory
Exception in thread Thread-5 (_realtime_worker):
Traceback (most recent call last):
File "/home/thetrustedcontainer/.python-3.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/home/thetrustedcontainer/.python-3.11/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/home/thetrustedcontainer/software/.RealtimeSTT-venv/lib/python3.11/site-packages/RealtimeSTT/audio_recorder.py", line 1496, in _realtime_worker
segments, info = self.realtime_model_type.transcribe(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/thetrustedcontainer/software/.RealtimeSTT-venv/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 397, in transcribe
encoder_output = self.encode(segment)
^^^^^^^^^^^^^^^^^^^^
File "/home/thetrustedcontainer/software/.RealtimeSTT-venv/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 838, in encode
return self.model.encode(features, to_cpu=to_cpu)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA failed with error out of memory
I had to stick with the CPU version that is slower yet sufficient for my use case. Nevertheless, I hope this guide will help other users with AMD GPUs get RealtimeSTT running on their cards. As with any unsupported hardware, your mileage may vary.
Below is a guide to running RealtimeSTT on AMD GPUs. Most of the time, building/replacing PyTorch and ONNX Runtime with their ROCm versions will work. However, this will not be enough as CTranslate2 also needs to be rebuilt for ROCm. Unfortunately, it is not officially supported, but someone has forked it to support these cards: https://github.com/arlo-phoenix/CTranslate2-rocm
Follow the build steps from the link above. You can optionally disable OpenMP and use another BLAS library like OpenBLAS. Then, install the ROCm build of CTranslate2 with pip and test RealtimeSTT. On my 8GB 5500 XT, it seemed to function but is really unusable; I got loads of out-of-memory errors, even on the tiny model with the beam search size set to 1.
I had to stick with the CPU version that is slower yet sufficient for my use case. Nevertheless, I hope this guide will help other users with AMD GPUs get RealtimeSTT running on their cards. As with any unsupported hardware, your mileage may vary.
Related: https://github.com/KoljaB/RealtimeSTT/issues/7#issuecomment-2028047137