SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
10.01k stars 841 forks source link

Super long video processing failure #757

Open another1s opened 3 months ago

another1s commented 3 months ago

I am tangled with a long video(about 5 hours) processing task and it kept running for about 24 hours and have not been finished yet, which takes much time than i expected and could possibly fail to generate ASR result. Is there a upper-limit video length for the model? i suspect it fell into infinite sequence generation problem.

Purfview commented 3 months ago

5 hours shouldn't be a problem for the modern hardware. Try smaller model if you don't have GPU with CUDA.

another1s commented 3 months ago

5 hours shouldn't be a problem for the modern hardware. Try smaller model if you don't have GPU with CUDA.

i agree, but it just happend and i really have no idea why it is unexpectedly slow it finally terminated with a bunch of hallucination outputs: some unrelevant sentences have been generated repeatedly...... i did utilize a Rtx4090 and lastest version of CUDA but it remains to be unexpectedly slow. when inferencing, it took up 3688 MB GPU memory and 2640 seconds to process a video with 1200 seconds duration....

i am really confused

Purfview commented 3 months ago

Share your command.

another1s commented 3 months ago

my_code.txt I only pasted the relevant function here.

server output_result1 the figure above are my gpu info screenshot and a glimpse of output result(too much to paste). the first line of the result is literally the same as the issue mention in https://github.com/openai/whisper/discussions/2015 ----- sentence from maybe training data or some else was generated. i am wondering if the length of video correlates with the probability of model hallucination .

another1s commented 3 months ago

my_code.txt I only pasted the relevant function here.

server output_result1 the figure above are my gpu info screenshot and a glimpse of output result(too much to paste). the first line of the result is literally the same as the issue mention in openai/whisper#2015 ----- sentence from maybe training data or some else was generated. i am wondering if the length of video correlates with the probability of model hallucination .

i guess my program ran so slow because of hallucination?

Purfview commented 3 months ago

the first line of the result is literally the same as the issue mention in...

One line of hallucination is nothing to worry about.

i guess my program ran so slow because of hallucination?

It's slow because you are running diarization there.

another1s commented 3 months ago

the first line of the result is literally the same as the issue mention in...

One line of hallucination is nothing to worry about.

i guess my program ran so slow because of hallucination?

It's slow because you are running diarization there.

oh....i thought it was a tiny model and would not bother. Thanks for your help, bug fixed