SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.59k stars 1.05k forks source link

Some segment has a 1 second shifted after PR #856 #1140

Closed heimoshuiyu closed 1 week ago

heimoshuiyu commented 1 week ago

appreciate your hard work


audio (2 minutes): 01.aac.zip

The correct SRT result (using commit fbcf58b, which is before the huge PR #856): 01.old.srt.zip

The wrong SRT result (using latest commit 85e61ea): 01.new.srt.zip


I am not using the batch version

model = faster_whisper.WhisperModel(
    model_size_or_path='large-v2',
    device='cuda',
    cpu_threads=4,
)
model.transcribe(
    audio=audio,
    language=None,
    task='transcribe',
    vad_filter=False,
    initial_prompt=None,
    word_timestamps=True,
    repetition_penalty=1.0,
)

script from this project https://github.com/heimoshuiyu/whisper-fastapi


image

some segments on the left (wrong) has 1 second mismatch (shift +1s) than the right (correct)


I also test on the commit of RP #856 (eb839023), which is worse

result SRT: 01.eb839023.srt.zip

image

left: commit eb839023 PR #856 middle: latest commit 85e61ea right: commit fbcf58b