SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.31k stars 890 forks source link

Release Subtitle Edit 4.0.2 #7593

Closed niksedk closed 10 months ago

niksedk commented 10 months ago

It's time for the next update - 4.0.2 :)

This version includes layouts (which replaces Show/hide video and Show/hide waveform). The new layouts make it much better to create subtitles for a mobile video in 9:16 format and/or using a vertical monitor. Layouts also make it possible to have video left or a few other options (layouts have shortcuts too):

image

This version also includes:

Lots of minor improvements + bug fixes: https://raw.githubusercontent.com/SubtitleEdit/subtitleedit/master/Changelog.txt

Give the beta a test run ❤️ https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip

Purfview commented 10 months ago

OpenAI just released large-v3 model. Stuff needs to be updated. :)

Purfview commented 10 months ago

@niksedk Could you add lzma support with this -> https://github.com/weltkante/managed-lzma?

niksedk commented 10 months ago

@niksedk Could you add lzma support with this -> https://github.com/weltkante/managed-lzma?

Looks like a dead package... not updated in 6 years.

niksedk commented 10 months ago

OpenAI just released large-v3 model. Stuff needs to be updated. :)

Whisper OpenAI updated with large-v3 in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip

Hopefully faster-whisper, cpp, const-me etc. will be updated soon, but that might be in a future SE version.

Purfview commented 10 months ago

...not updated in 6 years.

I like that. Does it needs to be updated to something?

Purfview commented 10 months ago

cuBLAS and cuDNN libs needs to be updated.

niksedk commented 10 months ago

cuBLAS and cuDNN libs needs to be updated.

What is v2 and what is v3?

Purfview commented 10 months ago

v2 is older, v3 is newer but no Kepler chips support, I've no idea if Kepler cards work with faster-whiper at all... Both fix the bug spotted in "v1" libs.

niksedk commented 10 months ago

v2 is older, v3 is newer but no Kepler chips support, I've no idea if Kepler cards work with faster-whiper at all... Both fix the bug spotted in "v1" libs.

Thx, libs updated :)

Purfview commented 10 months ago

Hopefully faster-whisper, cpp, const-me etc. will be updated soon, but that might be in a future SE version.

large-v3 HF model is released now, tonight I'll try to adapt it.

Purfview commented 10 months ago

Thx, libs updated :)

Maybe better to update to v2 instead of v3? Or you'll get bunch of users with issues. :)

I looked at wikipedia, these are the cards supported by faster-whisper and not supported by v3 [if Kepler chip]: https://en.wikipedia.org/wiki/GeForce_700_series https://en.wikipedia.org/wiki/GeForce_800M_series

Purfview commented 10 months ago

Released Whisper-Faster r160.3 with large-v3 support, here is link to the model,

darnn commented 10 months ago

There's a bleeding edge build of whisper.cpp linked here: https://github.com/ggerganov/whisper.cpp/issues/1437 And here's the actual link, in case you don't want to go digging for it: https://github.com/ggerganov/whisper.cpp/files/13285097/whisper.cpp-39a240b-win64-openblas.zip

This supports large-v3 in ggml format, which you can now get here: https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-large.bin Since, per this thread: https://github.com/Const-me/Whisper/issues/188 "they renamed the current Large to Large-V2 and Large is now the V3".

The V3 in ggml format seemed to work more or less okay for me with the new whisper.cpp, but produces much worse results with the modified version of Const-Me in the last thread I linked.

uckthis commented 10 months ago

Released Whisper-Faster r160.3 with large-v3 support, here is link to the model,

I manually updated to Whisper-Faster r160.3, and cloned the V3 model folder, but it isn't showing up in the drop-down menu. Something wrong on my side or SE needs to be updated? Untitled

Purfview commented 10 months ago

...SE needs to be updated?

Yes.

niksedk commented 10 months ago

Has anybody tested and compared the new Whisper-Faster r160.3 vs the old Whisper-Faster r153?

uckthis commented 10 months ago

Yes, I just did. I always use V2 with beam 10, and I didn't notice anything off with r160.3. This one probably transcribed the hour-long video a minute quicker.

JDTR75 commented 10 months ago

Has anybody tested and compared the new Whisper-Faster r160.3 vs the old Whisper-Faster r153?

I tried it from CLI with large-v3 (libs-v3) and it worked fine. However, I haven't run any comparison tests.

niksedk commented 10 months ago

I have some trouble running Purfview Faster Whipser with large-v3...

Date: 11/09/2023 19:40:41
SE: 4.0.1.409 - Microsoft Windows NT 10.0.22621.0 - 64-bit
Message: Calling whisper (Purfview's Faster-Whisper) with : C:\git\subtitleedit\src\ui\bin\Debug\Whisper\Purfview-Whisper-Faster\whisper-faster.exe --language en --model "large-v3"  "C:\Users\nikse\AppData\Local\Temp\a5fa4bb4-3cca-4d5b-bb58-7522a6e9bb3e.wav"
Standalone Faster-Whisper r160.3 running on: CPU
Starting transcription on: C:\Users\nikse\AppData\Local\Temp\a5fa4bb4-3cca-4d5b-bb58-7522a6e9bb3e.wav
[26752] Failed to execute script '__main__' due to unhandled exception!

ValueError: <|startoftranscript|> token was not found in the prompt

File "faster_whisper\transcribe.py", line 680, in generate_with_fallback

File "faster_whisper\transcribe.py", line 447, in generate_segments

File "faster_whisper\transcribe.py", line 961, in restore_speech_timestamps

File "D:\whisper-fast\__main__.py", line 622, in cli

File "D:\whisper-fast\__main__.py", line 681, in <module>

Traceback (most recent call last):
Calling whisper Purfview's Faster-Whisper done in 00:00:32.6182623

Hm, auto-dl in SE seems to break it... SE must corrupt the model...

Purfview commented 10 months ago

Can you share that wav?

niksedk commented 10 months ago

Can you share that wav?

SE seems to break the model during download...

(Edit: was due to vocabulary.txt renamed to vocabulary.json)

niksedk commented 10 months ago

SE beta updated with Purfview's Faster Whisper with large-v3: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip

Flitskikker commented 10 months ago

Looking nice. :)

One additional (minor) improvement for #7560 and #7601:

Starting situation: image

Press "Set end, add new, and go to new": image

Press undo / Ctrl+Z: image

Would it be possible to re-select the subtitle on the left, as before the action, so I can retry directly?

Flitskikker commented 10 months ago

By the way, #7329 is still happening for me, but I haven't had the time to fully investigate it, besides commenting out some lines and changing some rendering flags, to no avail. I'm sorry...

I assume you haven't encountered it on your end? Or maybe with my settings?

Settings.zip

Nomad234 commented 10 months ago

SE beta updated with Purfview's Faster Whisper with large-v3: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip

I've tried this beta version of Subtitle Edit and the v3 model. Great work. Thank you for your time and effort.

It's better at transcribing Arabic, but ignores any English words withing in the audio. I retried transcribing the same audio sample (on the same Subtitle Beta version) with the v2 model, and it easily recognized the English words and were transcribed along with the Arabic text.

Does this have to do with the smaller size of this v3 model (is there another larger version?) and is there a way to force both Arabic and English transcription of the same text?

Thank you for your help.

uckthis commented 10 months ago

@Nomad234, Did it add periods randomly in the middle of sentences?

uckthis commented 10 months ago

@Nomad234, Did it add periods randomly in the middle of sentences?

Purfview commented 10 months ago

Does this have to do with the smaller size of this v3 model

No.

Did it add periods randomly in the middle of sentences?

I didn't noticed such. Can you share a short sample for this?

uckthis commented 10 months ago

Does this have to do with the smaller size of this v3 model

No.

Did it add periods randomly in the middle of sentences?

I didn't noticed such. Can you share a short sample for this? Check with Start uppercase after paragraph in Common Errors and you'll see so many instances. 1.txt

Purfview commented 10 months ago

I meant audio sample.

uckthis commented 10 months ago

Sorry, can't share audio for this. NDA. I'll find a public audio and share.

uckthis commented 10 months ago

1.zip Try this. V3 Eng, beam size 8.

Purfview commented 10 months ago

1.zip Try this. V3 Eng, beam size 8.

Can't reproduce. Looks good for me with large-v3 and beam 8:

[00:00.000 --> 00:07.060]  Today on This Old House, I'll tour this modern home to show how beautiful features and accessible design go hand-in-hand.
[00:07.900 --> 00:13.420]  We're mixing mortar to patch the original brick on this 1960 mid-century modern.
[00:14.260 --> 00:18.420]  And I'll help the homeowner, Billy, build a DIY ramp for his son at camp.
[00:46.280 --> 00:50.960]  Hey there, I'm Kevin O'Connor and welcome back to our project here in Lexington, Massachusetts.
[00:51.300 --> 00:55.600]  That's part of our 45th season of This Old House.

Your shared srt looks like a post-processed one, so the issue probably is in SE.

uckthis commented 10 months ago

You are right. I just redid the file with unchecked use post-processing and there were no random periods. It is indeed a SE issue, maybe @niksedk would look into it.

Purfview commented 10 months ago

I think post-processing/reading subtitles from json would be a nice feature, that way lines would be split very accurately, json has timestamps for every word.

Nomad234 commented 10 months ago

Does this have to do with the smaller size of this v3 model?

No.

Thank you for your prompt reply. But what could possibly be the cause for neglecting the English words in an otherwise predominantly Arabic voice text? Is there some kind of setting that could solve this problem?

Thanks for helping me out!

Purfview commented 10 months ago

But what could possibly be the cause for neglecting the English words in an otherwise predominantly Arabic voice text?

Probably the model was trained that way.

Is there some kind of setting that could solve this problem?

Choose another model.

Purfview commented 10 months ago

@niksedk large-v3 adds one more language: Cantonese - yue.

Purfview commented 10 months ago

is there another larger version?

@Nomad234 Actually there is, but SE has selection only for "standard reference Whisper" ones, when Faster-Whisper can have 7 standard model variations per one size. :)

To use them you would need to run Faster-Whisper directly.

Purfview commented 10 months ago

Released Standalone Faster-Whisper r160.4.

uckthis commented 10 months ago

Released Standalone Faster-Whisper r160.4.

Avast puts it in quarantine.

niksedk commented 10 months ago

Released Standalone Faster-Whisper r160.4.

thx :) Beta updated: https://github.com/SubtitleEdit/subtitleedit/releases/download/4.0.1/SubtitleEditBeta.zip

Purfview commented 10 months ago

Released Standalone Faster-Whisper r160.4.

Avast puts it in quarantine.

Yeah, Avast and AVG flags all pyinstaller >=v6 files as generally suspicious, you can send a false-positive report for them but there is no point for Avast as its support is very slow, usually takes the few weeks for them to process it, till then there will be a new version... AVG support is pretty fast, about ~few days.

And there are some few other unknown to me softwares -> virustotal for r160.4.

Nomad234 commented 10 months ago

is there another larger version?

@Nomad234 Actually there is, but SE has selection only for "standard reference Whisper" ones, when Faster-Whisper can have 6 standard model variations per one size. :)

To use them you would need to run Faster-Whisper directly.

For a noob like me, can you tell me how to prompt for those models in cmd (in the Whisper-faster folder)? and in your opinion which model could best serve my use case (mainly Arabic speech, with occasional use of English)?

Appreciate your help!

Purfview commented 10 months ago

@Nomad234 if it's not about Subtitle Edit then open the issues in the right repositories. In this case there -> https://github.com/Purfview/whisper-standalone-win/issues

diomed commented 10 months ago

@niksedk can you check for that OCR thing

diomed commented 10 months ago

btw how to remove video player window? I forgot and rewrote my settings with new file so IDK how and I dont need it.

darnn commented 10 months ago

Use the layout icon: image Or assign a keyboard shortcut to it in Settings->Shortcuts->General->Choose layout.

Purfview commented 10 months ago

Released Whisper-Faster r160.5. large-v3 by default is using this model now -> https://huggingface.co/Purfview/faster-whisper-large-v3-fp16

Implemented a request to output subtitles with one word per line with --one_word switch. That way you can get more accurate timestamps on splits. People use "Merge short lines..." for these, but "post-preprocessing" option produces a bit borked output. Here is example of such subs: https://we.tl/t-zSDxp8pvmm

Nomad234 commented 10 months ago

Is this available on Subtitle Edit right now? I can't find any new model in the dropdown menu. Should I delete the previous v3 model (1.5gb) for it to appear?