转写报错 - Githubissues

jeshs commented 4 months ago

环境： win10 nVidia显卡 CUDA Version: 12.3
安装0.8后只进行了下面的操作： 1、模型参数设置：在线下载模型 medium.en 使用本地缓存 2、设置转写参数-音频语言：en-English 3、执行转写，添加mp4文件(英文语言)，开始转写，一直在转圈不动了，看 fasterwhispergui.log 日志出现下面的报错

==========2024-06-08_11:10:44==========
==========Process==========

redirect std output
vad_filter : True
    -threshold                : 0.5
    -min_speech_duration_ms   : 250
    -max_speech_duration_s    : inf
    -min_silence_duration_ms  : 2000
    -window_size_samples      : 1024
    -speech_pad_ms            : 400
Transcribes options:
    -audio : ['D:/Download/video/How Large Language Models Work.mp4']
    -language : en
    -task : False
    -beam_size : 5
    -best_of : 5
    -patience : 1.0
    -length_penalty : 1.0
    -temperature : [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
    -compression_ratio_threshold : 2.4
    -log_prob_threshold : -1.0
    -no_speech_threshold : 0.6
    -condition_on_previous_text : False
    -initial_prompt : None
    -prefix : None
    -suppress_blank : True
    -suppress_tokens : [-1]
    -without_timestamps : False
    -max_initial_timestamp : 1.0
    -word_timestamps : False
    -prepend_punctuations : "'“¿([{-
    -append_punctuations : "'.。,，!！?？:：”)]}、
    -repetition_penalty : 1.0
    -no_repeat_ngram_size : 0
    -prompt_reset_on_temperature : 0.5
    -max_new_tokens : None
    -chunk_length : 30.0
    -clip_mode : 0
    -clip_timestamps : 0
    -hallucination_silence_threshold : 0.5
    -hotwords : 
    -language_detection_threshold : None
    -language_detection_segments : 1
create transcribe process with 1 workers
start transcribe process
Traceback (most recent call last):
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper_GUI\transcribe.py", line 371, in run
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 621, in result_iterator
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 319, in _result_or_cancel
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 458, in result
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 403, in __get_result
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\thread.py", line 58, in run
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper_GUI\transcribe.py", line 281, in transcribe_file
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper\transcribe.py", line 1175, in restore_speech_timestamps
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper\transcribe.py", line 573, in generate_segments
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper\transcribe.py", line 824, in encode
RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

CheshireCC commented 4 months ago

当前 GPU 设备与 pytorch 依赖的 CUDA 算力版本不匹配造成的，目前由于 Ctranslate2 引擎只支持 CUDA12 ，我已经将 torch 和 CUDA 全部升级到了最高版本，你可以试试升级显卡驱动，如果还是不行我将会在下个版本尝试解决这个问题

jeshs commented 4 months ago

升级到下列版本，尝试仍然会失败，同样的错误 NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5

fasterwhispergui.log 错误信息如下:

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
torchvision is not available - cannot save figures
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.

faster_whisper_GUI: 0.8.0
==========2024-06-14_09:25:34==========
==========Start==========

current computer language region-format: zh_CN
language: zh

==========2024-06-14_09:26:10==========
==========LoadModel==========

    -model_size_or_path: medium.en
    -device: cuda
    -device_index: 0
    -compute_type: float32
    -cpu_threads: 4
    -num_workers: 1
    -download_root: C:/Users/user/.cache/huggingface/hub
    -local_files_only: True
    -use_v3_model: False

Load over
medium.en
max_length:             448
num_samples_per_token:  320
time_precision:  0.02
tokens_per_second:  50
input_stride:  2

==========2024-06-14_09:28:04==========
==========Process==========

redirect std output
vad_filter : True
    -threshold                : 0.5
    -min_speech_duration_ms   : 250
    -max_speech_duration_s    : inf
    -min_silence_duration_ms  : 2000
    -window_size_samples      : 1024
    -speech_pad_ms            : 400
Transcribes options:
    -audio : ['D:/Download/video/How Large Language Models Work.mp4']
    -language : en
    -task : False
    -beam_size : 5
    -best_of : 5
    -patience : 1.0
    -length_penalty : 1.0
    -temperature : [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
    -compression_ratio_threshold : 2.4
    -log_prob_threshold : -1.0
    -no_speech_threshold : 0.6
    -condition_on_previous_text : False
    -initial_prompt : None
    -prefix : None
    -suppress_blank : True
    -suppress_tokens : [-1]
    -without_timestamps : False
    -max_initial_timestamp : 1.0
    -word_timestamps : False
    -prepend_punctuations : "'“¿([{-
    -append_punctuations : "'.。,，!！?？:：”)]}、
    -repetition_penalty : 1.0
    -no_repeat_ngram_size : 0
    -prompt_reset_on_temperature : 0.5
    -max_new_tokens : None
    -chunk_length : 30.0
    -clip_mode : 0
    -clip_timestamps : 0
    -hallucination_silence_threshold : 0.5
    -hotwords : 
    -language_detection_threshold : None
    -language_detection_segments : 1
create transcribe process with 1 workers
start transcribe process
Traceback (most recent call last):
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper_GUI\transcribe.py", line 371, in run
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 621, in result_iterator
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 319, in _result_or_cancel
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 458, in result
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\_base.py", line 403, in __get_result
  File "D:\Program Files (x86)\FasterWhisperGUI\concurrent\futures\thread.py", line 58, in run
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper_GUI\transcribe.py", line 281, in transcribe_file
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper\transcribe.py", line 1175, in restore_speech_timestamps
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper\transcribe.py", line 573, in generate_segments
  File "D:\Program Files (x86)\FasterWhisperGUI\faster_whisper\transcribe.py", line 824, in encode
RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

CheshireCC commented 4 months ago

你的显卡型号和规格是？

jeshs commented 4 months ago

你的显卡型号和规格是？

GTX970M

CheshireCC commented 4 months ago

...这款 GPU 恐怕不能支持

CheshireCC / faster-whisper-GUI

转写报错 #152