Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Mozilla Public License 2.0
8.31k stars 714 forks source link

I tried it on a long form video pocast which is about 4.5 hours long and foud lots of punctuaiton missing #20

Open wuzimi opened 1 year ago

wuzimi commented 1 year ago

Dear Sir, I am really impressive with your application. Today I tried to generate txt transcript for a 4.5 hours podcast of Huberman Lab and it was really fast. I checked the transcript and most of the punctuations are good except when it come to the end of the text. Lots of puncations are missing. Maybe I should split the video file into two and then try again. Thanks!

zelenooki87 commented 1 year ago

you need to convert video to audio format....at least in my case there were no missmatch output when converted to wav. maybe project dev could include ffmpeg which will do this for us.

wuzimi commented 1 year ago

Thanks! I just tired and it succeeded. Also yesterday I tryied to convert the video file into a mp3 file. Then I split into two files. I had no problem with the first one and I ran into problem with the second half. It alway stopped somewhere when it comes to the end of the transcribing process. I tried several times and finally I adjusted the setting-advanced and selected wav32. Then it worked again. Just now I tried to convert the video into wav file and found there several kinds of wave type. Most of the wave file doesn't work with your application. Only WAV-GSM works. Another thing I would like to suggest to improve is the SRT subtitiles. I find some of subtitle line is too long. And my suggestions is as following.

  1. Allow user to set the maximum characters/line referring to Subtitle Edit setting.
  2. For each subtitle line it is best to end the line with punctuation like ,.?! etc. In this way it can avoid one whole sentence will be split into different subtitle line,which will make it difficult to translate into other language.

Thanks again for your work!

zelenooki87 @.***> 于2023年2月11日周六 19:49写道:

you need to convert video to audio format....at least in my case there were no missmatch output when converted to wav. maybe project dev could include ffmpeg which will do this for us.

— Reply to this email directly, view it on GitHub https://github.com/Const-me/Whisper/issues/20#issuecomment-1426708143, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4ZON2BXFDILJWVI4FA36Y3WW54ETANCNFSM6AAAAAAUXULMZ4 . You are receiving this because you authored the thread.Message ID: @.***>