Transcription stopped halfway

Purfview / whisper-standalone-win

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

1.21k stars 62 forks source link

Transcription stopped halfway #36

Closed nhan000 closed 1 year ago

nhan000 commented 1 year ago

I downloaded this 27 min Youtube video (uploaded it here).

I run the transcription using this code whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp4" --language en --model large-v2 --batch_recursive true

and it stopped at [13:15.860 --> 13:18.860] His greatest achievement was just around the corner.

I downloaded the mp3 file from that YouTube video (uploaded it here) whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp3" --language en --model large-v2 --batch_recursive true

and it was able to run to [26:44.760 --> 26:46.180] might have been enough.

Interestingly, it didn't transcribe the advertisement at the beginning and at the end of the video.

Purfview commented 1 year ago

Check if .srt subtitle file is created. [when you think that it's "stopped"]

nhan000 commented 1 year ago

The srt file was created and the later half was missing, same as the timestamp in the command prompt.

Purfview commented 1 year ago

Do you run it on cuda? If yes then try --compute_type=int8 parameter.

nhan000 commented 1 year ago

I added the parameter you gave me, so the code is

whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp4" --language en --model large-v2 --batch_recursive true --compute_type=int8

It ran on cuda

And it still stopped at the same location

Purfview commented 1 year ago

I reproduced this issue on my side. Later I'll check what can be done about it. Interestingly, this hallucination starts on the advertisement.

nhan000 commented 1 year ago

Thanks for looking into this, and separately, thanks for making this program. Very noob-friendly for people who are not very techy like me.

The video has 3 advertisement segments:

One at the beginning that Whisper Standalone doesn't transcribe for both mp4 and mp3 files.
One at the middle (13:19) that it transcribes in the mp3 file but stopped for the mp4 file.
One at the end (26:46) that it also doesn't transcribe in the mp3 file.

Purfview commented 1 year ago

It doesn't stuck with -beam_size=5 option.

Ad at start/end is still ignored, probably models are trained to ignore that ad. Btw tiny and base models transcribe that ad.

nhan000 commented 1 year ago

Thanks a lot! I will keep the beam size parameter in mind and change it around when I ran into issues.