SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.31k stars 890 forks source link

Whisper GPU mode not work #6781

Closed martjay closed 1 year ago

martjay commented 1 year ago

777777

rsmith02ct commented 1 year ago

Which Whisper are you using? There are 3.

martjay commented 1 year ago

Which Whisper are you using? There are 3.

No, I only saw Whisper.cpp, which should support GPU.

martjay commented 1 year ago

Which Whisper are you using? There are 3.

No, I only saw Whisper.cpp, which should support GPU.

rsmith02ct commented 1 year ago

Whisper CPP is CPU only. Const-me is GPU and Whisper Open AI uses CUDA on some systems (works on my desktop but not my laptop). After opening the Whisper menu, right-click and you'll see the 3 options.

MbuguaDavid commented 1 year ago

Which Whisper are you using? There are 3.

No, I only saw Whisper.cpp, which should support GPU.

Have you tried right-clicking on the Audio to Text Whisper interface to select Whisper Const-me? Right Click to select

Try that and see.

martjay commented 1 year ago

Whisper CPP is CPU only. Const-me is GPU and Whisper Open AI uses CUDA on some systems (works on my desktop but not my laptop). After opening the Whisper menu, right-click and you'll see the 3 options.

Wow, this is too awesome, my god, SE is now the best subtitle editing software in the world, thank you for reminding me!

martjay commented 1 year ago

Whisper CPP is CPU only. Const-me is GPU and Whisper Open AI uses CUDA on some systems (works on my desktop but not my laptop). After opening the Whisper menu, right-click and you'll see the 3 options.

Wow, this is too awesome, my god, SE is now the best subtitle editing software in the world, thank you for reminding me!

Which Whisper are you using? There are 3.

No, I only saw Whisper.cpp, which should support GPU.

Have you tried right-clicking on the Audio to Text Whisper interface to select Whisper Const-me? Right Click to select

Try that and see.

Wow, this is too awesome, my god, SE is now the best subtitle editing software in the world, thank you for reminding me!

martjay commented 1 year ago

Whisper CPP is CPU only. Const-me is GPU and Whisper Open AI uses CUDA on some systems (works on my desktop but not my laptop). After opening the Whisper menu, right-click and you'll see the 3 options.

Can this software do bilingual subtitles now? For example, Chinese is displayed above and English is displayed below.

MbuguaDavid commented 1 year ago

Whisper CPP is CPU only. Const-me is GPU and Whisper Open AI uses CUDA on some systems (works on my desktop but not my laptop). After opening the Whisper menu, right-click and you'll see the 3 options.

Can this software do bilingual subtitles now? For example, Chinese is displayed above and English is displayed below.

If you mean merging 2 subtitles into one and having both play at the same time, YES. Here's how it works. https://youtu.be/SVyZyMEFi-4

Thanks!

martjay commented 1 year ago

Whisper CPP is CPU only. Const-me is GPU and Whisper Open AI uses CUDA on some systems (works on my desktop but not my laptop). After opening the Whisper menu, right-click and you'll see the 3 options.

Can this software do bilingual subtitles now? For example, Chinese is displayed above and English is displayed below.

If you mean merging 2 subtitles into one and having both play at the same time, YES. Here's how it works. https://youtu.be/SVyZyMEFi-4

Thanks!

Thank you very much, you're amazing, but I hope SE can integrate this feature into the Whisper panel, so we can spend twice as much time directly obtaining bilingual subtitles. I hope someone can do this feature, which is very useful!

o-data commented 1 year ago

Anyone feel that 3.6.12 Whisper is very slow? Comparing to the beta .11 it like 5x slower. Is it just me?

rsmith02ct commented 1 year ago

Which implementation of Whisper are you using? I haven't noticed a slowdown. Const-me is still ~5x realtime for example.

On Mon, Mar 27, 2023 at 10:36 PM o-data @.***> wrote:

Anyone feel that 3.6.12 Whisper is very slow? Comparing to the beta .11 it like 5x slower. Is it just me?

— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6781#issuecomment-1485105437, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZOFWTQEHLI4MWW6N5DW6GJUJANCNFSM6AAAAAAWJDKGPM . You are receiving this because you commented.Message ID: @.***>

rRobis commented 1 year ago

@niksedk I think that the Whisper model "button" in the corner is not intuitive enough. I would never have thought that I can click to choose something there 😄

rsmith02ct commented 1 year ago

Good point, it's confusing. I also saw when I made a new portable installation of Whisper and clicked Whisper speech to text that it forced me to download CPP even though I only intend to use Const-Me and OpenAI Whisper!

On Tue, Mar 28, 2023 at 11:48 AM rRobis @.***> wrote:

@niksedk https://github.com/niksedk I think that the Whisper model "button" in the corner is not intuitive enough. I would never have thought that I can click to choose something there 😄

— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6781#issuecomment-1486126637, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZNZOWI742UB7XZ5VZ3W6JGONANCNFSM6AAAAAAWJDKGPM . You are receiving this because you commented.Message ID: @.***>

o-data commented 1 year ago

Which implementation of Whisper are you using? I haven't noticed a slowdown. Const-me is still ~5x realtime for example.

Might just be me if no one else have same issue, I will continue to test to see if it get better.

williethewolf commented 1 year ago

I'm getting "No text to transcribe" error with Const-me. OpenAI works fine.

Purfview commented 1 year ago

I'm getting "No text to transcribe" error with Const-me. OpenAI works fine

Const-me is for GPU, OpenAI works on CPU by default.

Const-me is still ~5x realtime for example.

@rsmith02ct How its speed compares to OpenAI [on CUDA]?

I've tested new OpenAI b102 build, it's ~23% faster than b80 [tested only CPU].

rsmith02ct commented 1 year ago

@Purfview I only did one test but for a a 2:30 clip, Const-me processed it in 8s vs 30 for OpenAI vs 35s with CPP on a 7th gen. Intel laptop i7 (base model for all). Note that CUDA fails on this laptop (GTX 1050) for unknown reasons with OpenAI. I haven't had time to run this test on my desktop where CUDA works on both.

Purfview commented 1 year ago

@rsmith02ct I only interested in a test on CUDA, pls drop a line if you test it.

darnn commented 1 year ago

FWIW, I tried using the large model with the CUDA version, and it gave me the same error regular whisper gives me: File "torch\nn\modules\module.py", line 987, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 6.00 GiB total capacity; 5.03 GiB already allocated; 0 bytes free; 5.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF [15736] Failed to execute script 'main' due to unhandled exception!

If you know of a way of resolving it, I'd love to hear it. On the medium model, it's far slower than Const-Me.

niksedk commented 1 year ago

FWIW, I tried using the large model with the CUDA version, and it gave me the same error regular whisper gives me: File "torch\nn\modules\module.py", line 987, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 6.00 GiB total capacity; 5.03 GiB already allocated; 0 bytes free; 5.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF [15736] Failed to execute script 'main' due to unhandled exception!

If you know of a way of resolving it, I'd love to hear it. On the medium model, it's far slower than Const-Me.

More memory?

darnn commented 1 year ago

That would make sense, but I found various things online claiming that's not necessarily the problem. But none of the solutions there worked for me, so maybe it it just that.

niksedk commented 1 year ago

I checked the memory usage at one point using the large model... I think it was 15GB (It might have been an older version of Whisper CPP) And I can only use the tiny/small models with GPU :(

darnn commented 1 year ago

Yeah, I think I saw something like 9-10 GB... oh well. I'll stick with Const-Me.

Purfview commented 1 year ago

FWIW, I tried using the large model with the CUDA version, and it gave me the same error regular whisper...

If you know of a way of resolving it, I'd love to hear it.

Your GPU has 6GB , you need ~10GB VRAM for large model. It's same as "regular whisper" exept that it's standalone and compatible with Win7.

On the medium model, it's far slower than Const-Me.

Do you have timings, "far" doesn't tell me much?

Purfview commented 1 year ago

Anyone feel that 3.6.12 Whisper is very slow? Comparing to the beta .11 it like 5x slower. Is it just me?

@o-data I noticed something similar, it's so random that I dunno what is going on, plus I get different results on each same run...

Purfview commented 1 year ago

@rsmith02ct @darnn Maybe you'll be interested in testing this:

https://github.com/SubtitleEdit/subtitleedit/discussions/6796

rsmith02ct commented 1 year ago

Hi Purview, I had a chance to test it on my desktop.

Original video is 2:35. I used medium size for all:

CPP 1:07 ConstMe 0:11 WhisperOpenAi 0:29

and for reference, Microsoft Azure through VEGAS Pro: 1:18.

On Thu, Mar 30, 2023 at 9:42 PM Purfview @.***> wrote:

@rsmith02ct https://github.com/rsmith02ct I only interested in a test on CUDA, pls drop a line if you test it.

— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6781#issuecomment-1490235723, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZPDIWJW3A6NHCPXDHLW6V5SRANCNFSM6AAAAAAWJDKGPM . You are receiving this because you were mentioned.Message ID: @.***>

Purfview commented 1 year ago

@rsmith02ct Pretty close to ConstMe, Btw, "standalone" has startup lag, for me it's ~12s, so real timing for your run would be ~0:17.

Test this - https://github.com/SubtitleEdit/subtitleedit/discussions/6796 [atm CPU only]. On CUDA you would be able to run large model with it and probably it runs faster than ConstMe.

rsmith02ct commented 1 year ago

I timed it from when I hit generate using a separate stopwatch as I care about the total time from a user perspective. Interesting about lag, maybe that can be addressed? If lag is fixed regardless of video size it may be trivial for the longer interviews I typically transcribe (1 hour).

I'd rather test Whisper implementations through SE that use CUDA as I don't see a benefit otherwise.

On Sat, Apr 1, 2023 at 12:01 PM Purfview @.***> wrote:

@rsmith02ct https://github.com/rsmith02ct Pretty close to ConstMe, Btw, "standalone" has startup lag, for me it's ~12s, so real timing for your run would be ~0:17.

Test this - #6796 https://github.com/SubtitleEdit/subtitleedit/discussions/6796 [atm CPU only]. On CUDA you would be able to run large model with and probably it runs faster than ConstMe.

— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6781#issuecomment-1492809540, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZKN23MBQ3KX4Z5DI43W66LAZANCNFSM6AAAAAAWJDKGPM . You are receiving this because you were mentioned.Message ID: @.***>

Purfview commented 1 year ago

Of course ~ten seconds are trivial for long runs, but you need to keep it mind in comparisons for ~few seconds tests. :)

That lag is python issue and can't be avoided.

I'd rather test Whisper implementations through SE that use CUDA as I don't see a benefit otherwise.

Maybe on CPU it runs at ~same speed like OpenAI on CUDA. :) Maybe this weekend I'll compile CUDA build.

vivadavid commented 1 year ago

Hi, I just wanted to give my feedback:

Purfview commented 1 year ago

I was able to use the ConstMe large model, even if my Nvidia RTX 2060 has only 6 GB of VRAM.

~10 GB you need for OpenAI large.

Just to compare the speed, I tried to download a CPP model (Spanish tiny to be more precise), and instead of getting the corresponding .PT file, the same .BIN file was downloaded again and replaced a previously downloaded file, which was identical. Needless to say, I had selected 'CPP' on the top right corner.

CPP and ConstMe use same models, ".PT" files are for OpenAI. You should compare to OpenAI or better to Faster.

vivadavid commented 1 year ago

I was able to use the ConstMe large model, even if my Nvidia RTX 2060 has only 6 GB of VRAM.

~10 GB you need for OpenAI large.

Just to compare the speed, I tried to download a CPP model (Spanish tiny to be more precise), and instead of getting the corresponding .PT file, the same .BIN file was downloaded again and replaced a previously downloaded file, which was identical. Needless to say, I had selected 'CPP' on the top right corner.

CPP and ConstMe use same models, ".PT" files are for OpenAI. You should compare to OpenAI or better to Faster.

Thanks for your information! There are so many concepts around Whisper, that I get confused: I'm trying to learn a little bit. I didn't know, for example, that ConstMe and CPP used the same models. If you have a good Nvidia card, ConstMe is much faster, but I wonder if the quality of the transcription is exactly the same.

o-data commented 1 year ago

Might be slightly different, you can try on both and compare result.