SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
10k stars 841 forks source link

ON arm64 'for segment in segments' run a lot of time #833

Open HduHestin opened 1 month ago

HduHestin commented 1 month ago
start_time=time.time()
segments, info = model.transcribe(audio_file, beam_size=5)
end_time=time.time()
print("inferencetime : "+str(end_time-start_time))
print_wav_info(audio_file)
start2=time.time()
for segment in segments:
    print(type(segment))
    # print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
end2=time.time()
print("fetch segments time : "+str(end2-start2))

audio_file is 5 seconds and inference time is about 3 seconds ,whisch is normal; BUT fetch time is too long,even i just print(type).i can't find any solutions.

HduHestin commented 1 month ago

Platform:::Linux lubancat 4.19.232 #12 SMP Fri Nov 10 10:22:31 CST 2023 aarch64 GNU/Linux

trungkienbkhn commented 1 month ago

@Hestinorwu , hello. Maybe your audio has too much hallucination so the model takes longer to process. Ex:

Processing audio with duration 01:48.553
Processing segment at 00:00.000
Compression ratio threshold is not met with temperature 0.0 (4.642857 > 2.400000)
Compression ratio threshold is not met with temperature 0.2 (4.187500 > 2.400000)
Compression ratio threshold is not met with temperature 0.4 (18.111111 > 2.400000)
Compression ratio threshold is not met with temperature 0.6 (5.089552 > 2.400000)
Log probability threshold is not met with temperature 0.6 (-1.003701 < -1.000000)
Log probability threshold is not met with temperature 0.8 (-2.498282 < -1.000000)
Log probability threshold is not met with temperature 1.0 (-4.083577 < -1.000000)
Reset prompt. prompt_reset_on_temperature threshold is met 1.000000 > 0.500000
Processing segment at 00:10.420
[0.00s -> 8.00s]  Qasid'aq Al-Maraham
[8.00s -> 10.42s]  Iqul
Compression ratio threshold is not met with temperature 0.0 (14.322581 > 2.400000)
Compression ratio threshold is not met with temperature 0.2 (24.470588 > 2.400000)
...

Could you show your log and attach your example audio ?