SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.93k stars 916 forks source link

Whisper model large-v2 - best workflow? #6611

Closed franzau closed 1 year ago

franzau commented 1 year ago

I just got subtitles with Whisper from a two-minutes high-quality video in Danish using the large-v2 model. I'm using a mid 2012 Macbook with 16Gb on Ubuntu 22.04

Since the transcription never finished from within SE, I tried from the command line and:

I understand the issue is between my computer and Whisper and not SE, but still wonder how to best use the large-v2 model for production.

From Whisper side I get two messages: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead")

and, later: [pcm_s16le @ 0x55c84b0508c0] Bitrate 128 is extremely low, maybe you mean 128k --- Metadata below Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 title : Kajakenergi - Gruppeopmærksomhed - Det Vigtige Hvorfor artist : Tue Olesen encoder : Lavf58.20.100 keywords : Main Duration: 00:02:06.21, start: 0.000000, bitrate: 695 kb/s Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default) Metadata: handler_name : SoundHandler vendor_id : [0][0][0][0] Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 563 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default) Metadata: handler_name : Core Media Video vendor_id : [0][0][0][0] Stream mapping: Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help ---- end

Is there something wrong here? Should I just accept that my computer is slow or is there something I can tweak?

Another question: could it make sense to run Whisper from a workbook on Colaboratory or something similar to "borrow" computing power, and then fine-tune everything in SE?

(I am really impressed by large -v2, I need reliable timestamps as much as a good transcription, and maybe more)

darnn commented 1 year ago

Re: Colab, yes, it would make a lot of sense. This is much faster for me than my own computer is: https://colab.research.google.com/github/ANonEntity/WhisperWithVAD/blob/main/WhisperWithVAD.ipynb

franzau commented 1 year ago

Thanks a lot, darnn!