Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Mozilla Public License 2.0
8.19k stars 702 forks source link

The results are correctly output in the Debug Console, but the green progress bar is stuck at the end. #185

Open ASH521 opened 11 months ago

ASH521 commented 11 months ago

When I was adding subtitles to a series of video compilations, approximately 65 videos in total, only a few of them encountered the following issue: The videos are in MP4 format and can be played normally. The debug information shows that the subtitles have been transcribed to the last sentence, but the green progress bar remains stuck at 100% and the generated SRT file is 0 bytes. I have several specific videos that encounter this error. Do you need me to provide the video files? It is worth noting that the other videos have no issues, only these few videos repeatedly encounter this problem.

emcodem commented 11 months ago

Can you try to extract the audio before transcribing to a wav file? To do it with a userinterface you can use whisperer i believe.

LucianoKlein commented 10 months ago

I am experiencing the same problem. Only specific file experiences this kind of issues.

RickArcher108 commented 10 months ago

A good free program for extracting audio from video is Panzera Free Audio Extractor. When I'm having problems, I sometimes solve them from running Whisper on the audio instead of the video, or vice versa.

LucianoKlein commented 10 months ago

My video files that encounter the following issues are in MP4 format too. I also extracted the audio files from the video. I extracted them into m4a format and still, it runs into the same problem. I tried to convert it into a .wav file format. And it successfully finished the entire process. I think the bug has something to do with m4s and mp4 encoding. (guess)

hhmmjjnn commented 1 month ago

image

image

image

The transcription gets stuck at 100% forever.

We've got this file that consistently triggers this bug 100% of the time. Its content is highly sensitive and we are not allowed to share it with anyone else under no circumstances.

We let it run for several hours, but it won't bulge.

When running on CPU, the usage is forever stuck at 12% in an 8-core machine. Doing the maths, it seems to suggest that a one core is at 100%, I guess??

When running from the .NET API, the runFull call never returns. Callbacks.onNewSegment is called normally, until it reaches 100%, then never again.

If I were to guess, I would bet all my coins on this loop missing some break, somewhere:

https://github.com/Const-me/Whisper/blob/306aadd1fce4b168cd38659236f4ba7c1603cebd/Whisper/source/whisper.cpp#L2861-L2861