Closed niksedk closed 11 months ago
OpenAI just released large-v3
model. Stuff needs to be updated. :)
@niksedk Could you add lzma support with this -> https://github.com/weltkante/managed-lzma?
@niksedk Could you add lzma support with this -> https://github.com/weltkante/managed-lzma?
Looks like a dead package... not updated in 6 years.
OpenAI just released
large-v3
model. Stuff needs to be updated. :)
Whisper OpenAI
updated with large-v3
in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip
Hopefully faster-whisper, cpp, const-me etc. will be updated soon, but that might be in a future SE version.
...not updated in 6 years.
I like that. Does it needs to be updated to something?
cuBLAS and cuDNN libs needs to be updated.
cuBLAS and cuDNN libs needs to be updated.
What is v2 and what is v3?
v2 is older, v3 is newer but no Kepler chips support, I've no idea if Kepler cards work with faster-whiper at all... Both fix the bug spotted in "v1" libs.
v2 is older, v3 is newer but no Kepler chips support, I've no idea if Kepler cards work with faster-whiper at all... Both fix the bug spotted in "v1" libs.
Thx, libs updated :)
Hopefully faster-whisper, cpp, const-me etc. will be updated soon, but that might be in a future SE version.
large-v3 HF model is released now, tonight I'll try to adapt it.
Thx, libs updated :)
Maybe better to update to v2
instead of v3
? Or you'll get bunch of users with issues. :)
I looked at wikipedia, these are the cards supported by faster-whisper and not supported by v3
[if Kepler chip]:
https://en.wikipedia.org/wiki/GeForce_700_series
https://en.wikipedia.org/wiki/GeForce_800M_series
Released Whisper-Faster r160.3 with large-v3
support, here is link to the model,
There's a bleeding edge build of whisper.cpp linked here: https://github.com/ggerganov/whisper.cpp/issues/1437 And here's the actual link, in case you don't want to go digging for it: https://github.com/ggerganov/whisper.cpp/files/13285097/whisper.cpp-39a240b-win64-openblas.zip
This supports large-v3 in ggml format, which you can now get here: https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-large.bin Since, per this thread: https://github.com/Const-me/Whisper/issues/188 "they renamed the current Large to Large-V2 and Large is now the V3".
The V3 in ggml format seemed to work more or less okay for me with the new whisper.cpp, but produces much worse results with the modified version of Const-Me in the last thread I linked.
Released Whisper-Faster r160.3 with
large-v3
support, here is link to the model,
I manually updated to Whisper-Faster r160.3, and cloned the V3 model folder, but it isn't showing up in the drop-down menu. Something wrong on my side or SE needs to be updated?
...SE needs to be updated?
Yes.
Has anybody tested and compared the new Whisper-Faster r160.3 vs the old Whisper-Faster r153?
Yes, I just did. I always use V2 with beam 10, and I didn't notice anything off with r160.3. This one probably transcribed the hour-long video a minute quicker.
Has anybody tested and compared the new Whisper-Faster r160.3 vs the old Whisper-Faster r153?
I tried it from CLI with large-v3 (libs-v3) and it worked fine. However, I haven't run any comparison tests.
I have some trouble running Purfview Faster Whipser with large-v3...
Date: 11/09/2023 19:40:41
SE: 4.0.1.409 - Microsoft Windows NT 10.0.22621.0 - 64-bit
Message: Calling whisper (Purfview's Faster-Whisper) with : C:\git\subtitleedit\src\ui\bin\Debug\Whisper\Purfview-Whisper-Faster\whisper-faster.exe --language en --model "large-v3" "C:\Users\nikse\AppData\Local\Temp\a5fa4bb4-3cca-4d5b-bb58-7522a6e9bb3e.wav"
Standalone Faster-Whisper r160.3 running on: CPU
Starting transcription on: C:\Users\nikse\AppData\Local\Temp\a5fa4bb4-3cca-4d5b-bb58-7522a6e9bb3e.wav
[26752] Failed to execute script '__main__' due to unhandled exception!
ValueError: <|startoftranscript|> token was not found in the prompt
File "faster_whisper\transcribe.py", line 680, in generate_with_fallback
File "faster_whisper\transcribe.py", line 447, in generate_segments
File "faster_whisper\transcribe.py", line 961, in restore_speech_timestamps
File "D:\whisper-fast\__main__.py", line 622, in cli
File "D:\whisper-fast\__main__.py", line 681, in <module>
Traceback (most recent call last):
Calling whisper Purfview's Faster-Whisper done in 00:00:32.6182623
Hm, auto-dl in SE seems to break it... SE must corrupt the model...
Can you share that wav?
Can you share that wav?
SE seems to break the model during download...
(Edit: was due to vocabulary.txt
renamed to vocabulary.json
)
SE beta updated with Purfview's Faster Whisper with large-v3: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip
Looking nice. :)
One additional (minor) improvement for #7560 and #7601:
Starting situation:
Press "Set end, add new, and go to new":
Press undo / Ctrl+Z:
Would it be possible to re-select the subtitle on the left, as before the action, so I can retry directly?
By the way, #7329 is still happening for me, but I haven't had the time to fully investigate it, besides commenting out some lines and changing some rendering flags, to no avail. I'm sorry...
I assume you haven't encountered it on your end? Or maybe with my settings?
SE beta updated with Purfview's Faster Whisper with large-v3: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip
I've tried this beta version of Subtitle Edit and the v3 model. Great work. Thank you for your time and effort.
It's better at transcribing Arabic, but ignores any English words withing in the audio. I retried transcribing the same audio sample (on the same Subtitle Beta version) with the v2 model, and it easily recognized the English words and were transcribed along with the Arabic text.
Does this have to do with the smaller size of this v3 model (is there another larger version?) and is there a way to force both Arabic and English transcription of the same text?
Thank you for your help.
@Nomad234, Did it add periods randomly in the middle of sentences?
@Nomad234, Did it add periods randomly in the middle of sentences?
Does this have to do with the smaller size of this v3 model
No.
Did it add periods randomly in the middle of sentences?
I didn't noticed such. Can you share a short sample for this?
Does this have to do with the smaller size of this v3 model
No.
Did it add periods randomly in the middle of sentences?
I didn't noticed such. Can you share a short sample for this? Check with Start uppercase after paragraph in Common Errors and you'll see so many instances. 1.txt
I meant audio sample.
Sorry, can't share audio for this. NDA. I'll find a public audio and share.
1.zip Try this. V3 Eng, beam size 8.
Can't reproduce. Looks good for me with large-v3 and beam 8:
[00:00.000 --> 00:07.060] Today on This Old House, I'll tour this modern home to show how beautiful features and accessible design go hand-in-hand.
[00:07.900 --> 00:13.420] We're mixing mortar to patch the original brick on this 1960 mid-century modern.
[00:14.260 --> 00:18.420] And I'll help the homeowner, Billy, build a DIY ramp for his son at camp.
[00:46.280 --> 00:50.960] Hey there, I'm Kevin O'Connor and welcome back to our project here in Lexington, Massachusetts.
[00:51.300 --> 00:55.600] That's part of our 45th season of This Old House.
Your shared srt looks like a post-processed one, so the issue probably is in SE.
You are right. I just redid the file with unchecked use post-processing and there were no random periods. It is indeed a SE issue, maybe @niksedk would look into it.
I think post-processing/reading subtitles from json would be a nice feature, that way lines would be split very accurately, json has timestamps for every word.
Does this have to do with the smaller size of this v3 model?
No.
Thank you for your prompt reply. But what could possibly be the cause for neglecting the English words in an otherwise predominantly Arabic voice text? Is there some kind of setting that could solve this problem?
Thanks for helping me out!
But what could possibly be the cause for neglecting the English words in an otherwise predominantly Arabic voice text?
Probably the model was trained that way.
Is there some kind of setting that could solve this problem?
Choose another model.
@niksedk large-v3
adds one more language: Cantonese
- yue
.
is there another larger version?
@Nomad234 Actually there is, but SE has selection only for "standard reference Whisper" ones, when Faster-Whisper can have 7 standard model variations per one size. :)
To use them you would need to run Faster-Whisper directly.
Released Standalone Faster-Whisper r160.4.
Released Standalone Faster-Whisper r160.4.
Avast puts it in quarantine.
Released Standalone Faster-Whisper r160.4.
thx :) Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip
Released Standalone Faster-Whisper r160.4.
Avast puts it in quarantine.
Yeah, Avast and AVG flags all pyinstaller >=v6 files as generally suspicious, you can send a false-positive report for them but there is no point for Avast as its support is very slow, usually takes the few weeks for them to process it, till then there will be a new version... AVG support is pretty fast, about ~few days.
And there are some few other unknown to me softwares -> virustotal for r160.4.
is there another larger version?
@Nomad234 Actually there is, but SE has selection only for "standard reference Whisper" ones, when Faster-Whisper can have 6 standard model variations per one size. :)
To use them you would need to run Faster-Whisper directly.
For a noob like me, can you tell me how to prompt for those models in cmd (in the Whisper-faster folder)? and in your opinion which model could best serve my use case (mainly Arabic speech, with occasional use of English)?
Appreciate your help!
@Nomad234 if it's not about Subtitle Edit then open the issues in the right repositories. In this case there -> https://github.com/Purfview/whisper-standalone-win/issues
@niksedk can you check for that OCR thing
btw how to remove video player window? I forgot and rewrote my settings with new file so IDK how and I dont need it.
Use the layout icon: Or assign a keyboard shortcut to it in Settings->Shortcuts->General->Choose layout.
Released Whisper-Faster r160.5.
large-v3
by default is using this model now -> https://huggingface.co/Purfview/faster-whisper-large-v3-fp16
Implemented a request to output subtitles with one word per line with --one_word
switch. That way you can get more accurate timestamps on splits.
People use "Merge short lines..." for these, but "post-preprocessing" option produces a bit borked output.
Here is example of such subs: https://we.tl/t-zSDxp8pvmm
Is this available on Subtitle Edit right now? I can't find any new model in the dropdown menu. Should I delete the previous v3 model (1.5gb) for it to appear?
It's time for the next update - 4.0.2 :)
This version includes
layouts
(which replacesShow/hide video
andShow/hide waveform
). The new layouts make it much better to create subtitles for a mobile video in 9:16 format and/or using a vertical monitor. Layouts also make it possible to have video left or a few other options (layouts have shortcuts too):This version also includes:
Remove interjections
are now language specific, easier to edit, and have a skip list.Lots of minor improvements + bug fixes: https://raw.githubusercontent.com/SubtitleEdit/subtitleedit/master/Changelog.txt
Give the beta a test run ❤️ https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip