Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Mozilla Public License 2.0
7.67k stars 665 forks source link

can you add a support of faster-whisper model? #126

Closed MrFutureV closed 1 year ago

MrFutureV commented 1 year ago

in which would be this one https://github.com/guillaumekln/faster-whisper

Const-me commented 1 year ago

@MrFutureV It seems the project you have linked is based on CUDA.

Using CUDA means vendor lock-in to nVidia, and the company is simply too greedy. For some use cases, cost efficiency difference is an order of magnitude. For example, AMD 7900 XTX and nVidia L40 deliver equivalent FP64 performance (both FP64 TFlops, and VRAM bandwidth numbers are pretty close), but the nVidia costs almost 10x more.

Another reason to avoid CUDA is the runtime. Current version of my software is less than 1MB of binaries with no dependencies. D3D is an essential OS component supported by Microsoft. To use CUDA, end users gonna need some nVidia’s runtime DLLs. In Cuda 12, the 4 files cublas64_12.dll, cublasLt64_12.dll, cudnn_cnn_infer64_8.dll, cudnn_ops_infer64_8.dll are 1.27 GB combined. These runtime files also come with a weird EULAs.

Also, CUDA runtime has hardware compatibility issues across generations of their hardware. D3D 11.0 came with Windows 7 in 2009, now in 2023 the support is almost universal. My software even runs on Linux using Wine, thanks to the Valve’s work on DXVK 2.0.