CheshireCC / faster-whisper-GUI

faster_whisper GUI with PySide6
GNU Affero General Public License v3.0
1.72k stars 104 forks source link

(080) 使用kotoba-whisper-v2.0-faster转写时闪退 #244

Open A2Sumie opened 1 month ago

A2Sumie commented 1 month ago

日志断头,没记录到错误 以防万一开关了一下v3模型的开关没用,下面日志是关了v3开关的但开着症状也一模一样 一般的v3还有v3turbo啥的没什么问题 faster-whisper 1.0.3没提什么看上去能导致这种问题的修复就交个issues吧 虽然这模型能跑了估计效果也不会满意(x

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
torchvision is not available - cannot save figures
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.

faster_whisper_GUI: 0.8.0
==========2024-10-21_09:39:12==========
==========Start==========

language: zh

==========2024-10-21_09:39:44==========
==========LoadModel==========

    -model_size_or_path: E:/WhisperModel/kotoba-whisper-v2.0-faster
    -device: cuda
    -device_index: 0
    -compute_type: float16
    -cpu_threads: 10
    -num_workers: 1
    -download_root: C:/Users/kaito/.cache/huggingface/hub
    -local_files_only: False
    -use_v3_model: False

Load over
E:/WhisperModel/kotoba-whisper-v2.0-faster
max_length:             448
num_samples_per_token:  320
time_precision:  0.02
tokens_per_second:  50
input_stride:  2

==========2024-10-21_09:39:56==========
==========Process==========

redirect std output
vad_filter : True
    -threshold                : 0.2
    -min_speech_duration_ms   : 10
    -max_speech_duration_s    : inf
    -min_silence_duration_ms  : 10
    -window_size_samples      : 1536
    -speech_pad_ms            : 300
Transcribes options:
    -audio : ['E:/Downloads/3_p3_(vocals).flac']
    -language : None
    -task : False
    -beam_size : 16
    -best_of : 10
    -patience : 3.6
    -length_penalty : 1.0
    -temperature : [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
    -compression_ratio_threshold : 4.0
    -log_prob_threshold : -1.0
    -no_speech_threshold : 0.95
    -condition_on_previous_text : False
    -initial_prompt : None
    -prefix : None
    -suppress_blank : False
    -suppress_tokens : [-1]
    -without_timestamps : False
    -max_initial_timestamp : 1.0
    -word_timestamps : True
    -prepend_punctuations : 
    -append_punctuations : .。,,!!??、
    -repetition_penalty : 1.0
    -no_repeat_ngram_size : 0
    -prompt_reset_on_temperature : 0.5
    -max_new_tokens : None
    -chunk_length : 30.0
    -clip_mode : 0
    -clip_timestamps : 0
    -hallucination_silence_threshold : 4.0
    -hotwords : 
    -language_detection_threshold : None
    -language_detection_segments : 1
create transcribe process with 1 workers
start transcribe process

2024-10-21_09:39:57 - faster_whisper - INFO - Processing audio with duration 11:29.006
2024-10-21_09:39:59 - faster_whisper - INFO - VAD filter removed 00:24.230 of audio
2024-10-21_09:39:59 - faster_whisper - DEBUG - VAD filter kept the following audio segments: [00:00.180 -> 00:05.196], [00:06.228 -> 00:26.880], [00:26.880 -> 00:34.848], [00:34.848 -> 00:46.956], [00:47.796 -> 00:49.632], [00:49.632 -> 01:40.332], [01:40.404 -> 01:49.452], [01:49.908 -> 01:54.336], [01:54.336 -> 02:16.512], [02:16.512 -> 02:17.952], [02:17.952 -> 02:19.392], [02:19.392 -> 02:20.640], [02:20.640 -> 02:21.420], [02:21.588 -> 02:42.624], [02:42.624 -> 02:48.864], [02:48.864 -> 03:02.544], [03:02.544 -> 03:04.908], [03:05.556 -> 03:21.456], [03:21.456 -> 03:29.760], [03:29.760 -> 03:34.284], [03:34.740 -> 05:18.000], [05:18.000 -> 05:19.632], [05:19.632 -> 05:31.116], [05:31.476 -> 05:46.464], [05:46.464 -> 05:48.000], [05:48.000 -> 06:25.872], [06:25.872 -> 06:29.136], [06:29.136 -> 07:44.748], [07:53.940 -> 09:08.160], [09:08.160 -> 09:15.264], [09:15.264 -> 09:19.116], [09:19.284 -> 09:24.720], [09:24.720 -> 09:39.024], [09:39.024 -> 09:43.776], [09:43.776 -> 09:50.976], [09:50.976 -> 10:03.552], [10:03.552 -> 10:09.168], [10:09.168 -> 10:10.224], [10:10.224 -> 10:13.296], [10:13.296 -> 10:16.224], [10:16.224 -> 10:17.808], [10:17.808 -> 10:27.936], [10:27.936 -> 10:33.936], [10:33.936 -> 10:37.296], [10:37.296 -> 10:38.592], [10:38.592 -> 10:41.424], [10:41.424 -> 10:43.296], [10:43.296 -> 11:06.192], [11:06.192 -> 11:10.032], [11:10.032 -> 11:11.808], [11:11.808 -> 11:18.348]
2024-10-21_09:40:00 - faster_whisper - INFO - Detected language 'ja' with probability 0.99
2024-10-21_09:40:00 - faster_whisper - DEBUG - Processing segment at 00:00.000
CheshireCC commented 1 month ago

关闭 单词级时间戳 试一下

A2Sumie commented 1 month ago

关闭 单词级时间戳 试一下

关闭有用,正常处理完了 我手上的环境在试独立whisperX时炸掉了,不知道后续能不能行

CheshireCC commented 1 month ago

关闭 单词级时间戳 试一下

关闭有用,正常处理完了 我手上的环境在试独立whisperX时炸掉了,不知道后续能不能行

whisperX 需要科学上网

A2Sumie commented 1 month ago

关闭 单词级时间戳 试一下

关闭有用,正常处理完了 我手上的环境在试独立whisperX时炸掉了,不知道后续能不能行

whisperX 需要科学上网

不是科学上网的问题,我在日本 我发现反复点击标签页后可以让whisperX相对正常的运行,不知道是不是gui上有什么情况