SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.33k stars 946 forks source link

ValueError: Requested int8 compute type, but the target device or backend do not support efficient int8 computation. #955

Open facundobatista opened 1 month ago

facundobatista commented 1 month ago

Hello!

I'm getting that error from this lib when running whisperx on a mp3. The complete traceback is:

Traceback (most recent call last):
  File "/home/facundo/devel/envwhisperx/bin/whisperx", line 8, in <module>
    sys.exit(cli())
  File "/home/facundo/devel/envwhisperx/lib/python3.10/site-packages/whisperx/transcribe.py", line 171, in cli
    model = load_model(model_name, device=device, device_index=device_index, download_root=model_dir, compute_type=compute_type, language=args['language'], asr_options=asr_options, vad_options={"vad_onset": vad_onset, "vad_offset": vad_offset}, task=task, threads=faster_whisper_threads)
  File "/home/facundo/devel/envwhisperx/lib/python3.10/site-packages/whisperx/asr.py", line 289, in load_model
    model = model or WhisperModel(whisper_arch,
  File "/home/facundo/devel/envwhisperx/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 130, in __init__
    self.model = ctranslate2.models.Whisper(
ValueError: Requested int8 compute type, but the target device or backend do not support efficient int8 computation.

I'm running it like this: whisperx encuentro.mp3 --compute_type int8 --model large-v2 --verbose True --language es

I'm in a virtualenv, created with this dependencies:

ctranslate2 < 4
dateutils
requests
whisperx

ctranslate2 is pinned to less than 4 (actually using 3.24.0) because of CUDA driver 11.4 of Nvidia driver 470 for Geforce GT740.

GPU details:

$ nvidia-smi 
Wed Aug  7 13:29:59 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.256.02   Driver Version: 470.256.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:07:00.0 N/A |                  N/A |
| 30%   40C    P8    N/A /  N/A |    394MiB /  1998MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Any idea of what is going on? How can I fix it, or workaround it? thanks!!

ozancaglayan commented 4 weeks ago

Hi,

You are using whisperx but that tool is not part of this repository.

facundobatista commented 4 weeks ago

But the error is inside the library:

  File "/home/facundo/devel/envwhisperx/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 130, in __init__
    self.model = ctranslate2.models.Whisper(

Maybe there's a better way to reproduce it only for the library?

ozancaglayan commented 4 weeks ago

Oh sorry. Its actually unrelated to the libraries. Your GPU is too old

INT8 precision requires a CUDA GPU with a compute capability of 6.1, 7.0, or higher

whereas GT740's compute capability is 3.0 https://developer.nvidia.com/cuda-gpus

You should simply use float32 or auto as compute-type

facundobatista commented 4 weeks ago

Ah, my bad, sorry, I thought that int8 was the "simplest" one.

BTW, float32 doesn't work either, but I don't really know what to report, because I just get a segmentation fault :(

julienatry commented 1 week ago

+1 on this i get a segmentation fault while using the usage example from the project using a GTX 960 with float32