SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.24k stars 936 forks source link

no kernel image is available for execution on the device #806

Open liu1352183717 opened 4 months ago

liu1352183717 commented 4 months ago

[ctranslate2] [thread 10436] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead. Detected language 'zh' with probability 1.000000 Traceback (most recent call last): File "C:\Users\Administrator\Desktop\pythonProject\main.py", line 23, in for segment in segments: File "C:\Users\Administrator.conda\envs\faster_whisper\lib\site-packages\faster_whisper\transcribe.py", line 1106, in restore_speech_timestamps for segment in segments: File "C:\Users\Administrator.conda\envs\faster_whisper\lib\site-packages\faster_whisper\transcribe.py", line 511, in generate_segments encoder_output = self.encode(segment) File "C:\Users\Administrator.conda\envs\faster_whisper\lib\site-packages\faster_whisper\transcribe.py", line 762, in encode return self.model.encode(features, to_cpu=to_cpu) RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

Purfview commented 4 months ago

What is your GPU model?

liu1352183717 commented 4 months ago

GeForce GTX 950

LeonVeganMan commented 4 months ago

Hi,

same problem here with Tesla M40, CUDA 12.4.

Thanx. Ciao. L.

laraws commented 3 months ago

Same issue, my gpu is GTX 950M

seclog commented 3 months ago

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

laraws commented 3 months ago

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

Thx for your reply. I tried it. But it didn't work.

seclog commented 3 months ago

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

Thx for your reply. I tried it. But it didn't work.

GTX 950M has the compute capability of 5.0.You have the same hardware as me. This is my own ctranslate2.dll compiled on windows11 compatible with cudnn9, you can try to replace the ctranslate2.dll in the fastwhisper script directory, pay attention to install the new graphics card driver, cudatoolkits12 ,cudnn9 support cuda12, otherwise some other dll dependencies lead to not work. This dll may not support other cards, because it is only compatible with cuda5.0, so the best way is to replace ctranslate2 in version 3.24. ctranslate2_cuda5.0.zip

fastwhisper2024-05-17 044350 fastwhisper2024-05-17 2044350

laraws commented 3 months ago

I had the same problem and have solved it. This is because the version of ctranslate2 used by fastwhisper is too high and is not compatible with older versions of cuda cards, such as cuda5.0. Just replace the fastwhisper installation library ctranslate2.dll with the version 3.24 ctranslate2.dll. Or download the source code of ct2-4.2, compile a native cuda arch compatible dll, and replace ctranslate2.dll in the fastwhisper installation library directory.

Thx for your reply. I tried it. But it didn't work.

GTX 950M has the compute capability of 5.0.You have the same hardware as me. This is my own ctranslate2.dll compiled on windows11 compatible with cudnn9, you can try to replace the ctranslate2.dll in the fastwhisper script directory, pay attention to install the new graphics card driver, cudatoolkits12 ,cudnn9 support cuda12, otherwise some other dll dependencies lead to not work. This dll may not support other cards, because it is only compatible with cuda5.0, so the best way is to replace ctranslate2 in version 3.24. ctranslate2_cuda5.0.zip

fastwhisper2024-05-17 044350 fastwhisper2024-05-17 2044350

Thank you for your reply. My system is Ubuntu, can I use the "ctranslate2.dll"

risacher commented 3 months ago

@laraws No, a .dll is a Windows shared library and will not work on Ubuntu. I have also been trying to get faster-whisper to work on my Ubuntu 22.04 server with a GTX 750 Ti (which is compute capability 5.0, like others in this thread.) As noted above, the version of CTranslate2 bundled with faster-whisper does not support this, so I recompiled it from source.

This was slightly tricky as I wanted to be able to run on both CPU and GPU. CPU support requires Intel MKL, and the 22.04 package for that is missing a pkg-config file.

So, for reference of anyone else trying this, I configured CTranslate2 with: cmake .. -DWITH_MKL=ON -DWITH_CUDA=ON -DWITH_CUDNN=ON -DMKL_ROOT=/usr -DMKL_INCLUDE_DIR=/usr/include/mkl. In order for cmake to auto-detect the compute-capability, cmake must be run on the machine with the GPU and the NVIDIA driver and driver API must be installed and match. If you are compiling on a different machine you should be able to add -DCMAKE_CUDA_ARCHITECTURES=50 to force it to use compute capability 5.0.

Be sure to add /usr/local/lib to $LD_LIBRARY_PATH and /usr/local/bin to $PATH, and delete the ctranslate2.so files installed in ~/.local/lib/python3.10/site-packages/. (I think they were in site-packages/ctranslate2.libs/ or something like that). I think to compile CTranslate2 I also had to have CUDA installed and nvcc must be in the PATH, so I needed to add /usr/local/cuda/bin to PATH and /usr/local/cuda/lib64 to LD_LIBRARY_PATH.

All that said, when I tried to run the sample code on the GPU with a 30-second sample, it segfaulted, so I'd be interested if anyone else gets it to work. It works on the CPU, but too slow for my application.

risacher commented 3 months ago

I also note that CTranslate2 apparently thinks the GTX 750 Ti only really supports float32, which I determined by running

import ctranslate2
t = ctranslate2.get_supported_compute_types("cuda")
print(f"get_supported_compute_types(\"cuda\"): {t}")

As a result I requantized the model to float32 like this:

ct2-transformers-converter --model openai/whisper-small.en --output_dir f32-whisper-small.en --copy_files tokenizer.json --quantization float32
gbozo commented 3 months ago

For those struggling with old hardware, tried like @risacher to recompile was successful but was not needed after all. Works on my GTX 750 with compute capability 5.0, just get the Dockerfile (derived from rhasspy:wyoming-whisper): `FROM rhasspy/wyoming-whisper:latest

RUN \ apt-get update \ && apt install wget sudo -y \ && sed -i 's/main/main contrib non-free/g' /etc/apt/sources.list \

&& echo "deb [signed-by=/usr/share/keyrings/nvidia-drivers.gpg] https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/ /" > /etc/apt/sources.list.d/nvidia-drivers.list \

&& apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/3bf863cc.pub \

&& apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/debian11/x86_64/7fa2af80.pub \

&& wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.1-1_all.deb \
&& dpkg -i cuda-keyring_1.1-1_all.deb \
&& echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda-debian11-x86_64.list \
&& cat /etc/apt/sources.list \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
    libcublas11 \
    libcudnn8 \
    libcublaslt11 \
    libcublas-12-5 \
\
&& pip3 install --no-cache-dir -U \
    ctranslate2==3.24.0 \
\
&& apt-get purge -y --auto-remove \
&& rm -rf /var/lib/apt/lists/*`

Name it "Dockerfile" in his own folder(better), build with: docker build . -t whisper:latest

run your image, remember its called whisper:latest not rhaspy:..... rest as usual. If you get invalid or whatever, you don't have enough memory. The main problem when trying to use it with ctranslate2 3.24 and latest driver 555... is that you need the cublas12 library which is toolkit 12-5 but also cublas 11 and cudnn8. Good luck.

gbozo commented 3 months ago

For sake of completion, docker compose file: `services: whisper: container_name: whisper image: whisper:latest command: --model base --language en --beam-size 5 --device cuda environment:

pixu2019 commented 1 month ago

@gbozo pls reformat your answer. it quite hard to understand or copy and paste. thanks