SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.32k stars 1.04k forks source link

CUDA 12.1 and CUBLAS and CUDNN without having to compile from source #584

Open BBC-Esq opened 11 months ago

BBC-Esq commented 11 months ago

Is CUDA 12.1 support coming or in the works? Just curious since faster-whisper keeps looking for cublas11.dll...and although I don't use cudnn, I'm assuming that would be another aspect to consider? Thanks.

phineas-pta commented 11 months ago

cublas & cudnn are requirements to use faster whisper

if u want cuda 12 u can build ctranslate2 from source

also avoid spamming issues all over the place

BBC-Esq commented 11 months ago

I'm sorry, how did I spam "everywhere?" My suggestion was regarding having it supported out-of-the-box without having to compile, is an inappropriate suggestion? I changed the title of the issue, if that's more appropriate.

Purfview commented 11 months ago

@BBC-Esq Yes, you are spamming all over the place. STOP IT!

It's not even faster-whisper issue...

BBC-Esq commented 11 months ago

How am I spamming everywhere? I posted one issue in ctranslate2 and one in faster-whisper. I like ctranslate2/faster-whisper a lot and respect the people who work on it, but I'm getting to much flak. Thanks.

phineas-pta commented 11 months ago

the suggestion is not inappropriate, it's legit request, even me im struggling to build ctranslate2

but the way u open multiple issues is inappropriate, u already join the open issue about cuda 12

BBC-Esq commented 11 months ago

the suggestion is not inappropriate, it's legit request, even me im struggling to build ctranslate2

but the way u open multiple issues is inappropriate, u already join the open issue about cuda 12

Thanks for the response. I'll admit github is confusing to me and the structure of opening issues (let along pull requests). Can you help me understand where I opened multiple issues and how to delete excessive issues? As far as I know I did one issue on ctranslate2 and one on faster-whisper. As a noob I'm conscious of managing the number of "issues" on my own github so...

phineas-pta commented 11 months ago

there's an open issue about cuda 12, u already joined in, but u open more issues

a way of respecting the devs is avoid sending them multiple notifications about the same thing they're working on

u cannot delete issues u opened, only repo owner can delete, meanwhile u can close issues

BBC-Esq commented 11 months ago

Aha! Thanks, I'll find the open issue for CUDA 12 and close this out out then!

Purfview commented 11 months ago

There you are already posting in the right place -> https://github.com/OpenNMT/CTranslate2/issues/1250

As far as I know I did one issue on ctranslate2 and one on faster-whisper.

Both are spam. One was closed -> https://github.com/OpenNMT/CTranslate2/issues/1563

BBC-Esq commented 11 months ago

Yep! I see that now. Neither are spam. One was a mistake and was rightfully closed by the admin. Here's my last comment on the topic. image

Purfview commented 11 months ago

The post above is spam too, what's the point of you posting screenshots of your spam, we already know where you spammed.

BBC-Esq commented 11 months ago

If the admin of this repository instructs me to stop posting things like what I did, I will. Otherwise, please stop messaging me through this forum. I'm trying to be constructive on this library, which I have a lot of respect for. Goodbye.

Purfview commented 11 months ago

Yep! I see that now. Neither are spam. One was a mistake...

That's trolling now.

Otherwise, please stop messaging me through this forum.

I'll if God instructs me to. 😆

Qubitium commented 11 months ago

@Purfview Your responses are inapposite to an end user. Of course it is a faster-whisper issue because it depends on ctranslate2.

Anyways, ctranslate2 is dragging it's feet on cuda12.x support, you have to built it yourself if you want to remove the errors. This is not a trivial task for an end-user that is not versed in python env and cuda dependencies.

Purfview commented 11 months ago

@Qubitium No, it's no the issue of this repo because it can't be fixed here. With your logic you can post it on the Python forums, because it's the issue when you use Python. 😉

Purfview commented 10 months ago

Sad news, the tests shows that "Faster-Whisper CUDA v12" has -10% drop in performance, so, stay with CUDA v11.

RTX 3050 GPU:

float16:      -10% drop in speed 
bfloat16:      -8% drop in speed
int8_bfloat16:  0% same
Qubitium commented 10 months ago

@Purfview Which specific v12 and nvidia driver? also platform linux or windows? There were some early v12 cuda/driver esp with win11 due to vram swapping to cpu ram that dropped perf but not sure of latest.

Purfview commented 10 months ago

@Qubitium I think "546.33" and other stuff is currently latest official versions. On Windows.

Qubitium commented 10 months ago

Check to disable nvidias "virtual" vram gpu feature they introduced in 12.x in windows which auto swap vram to host ram. Lots of users got caught with this killing perf.

Purfview commented 10 months ago

Check to disable nvidias "virtual" vram gpu feature they introduced in 12.x in windows which auto swap vram to host ram. Lots of users got caught with this killing perf.

Thx for info. Looks like it's called "CUDA Sysmem Fallback", disabling it didn't had practical influence on the results. But there was a hyper setting found - > "Hardware-accelerated GPU scheduling", it should be ON for performance.

Diff from the tests in a new environment:

Various OS optimizations. Actual CUDA12 install to system. CUDA Sysmem Fallback: OFF Hardware-accelerated GPU scheduling: ON

float16:       -1% drop in speed 
bfloat16:      -5% drop in speed
int8_float16: -21% drop in speed
Qubitium commented 10 months ago
float16:       -1% drop in speed 
bfloat16:      -5% drop in speed
int8_float16: -21% drop in speed

The float16 data showing there it's within the margin of error but not bf16 or int8_float16. My suggestion, use windows as your desktop, do everything cuda related under native (not virtualized) linux to get 100% speed.

Purfview commented 10 months ago

Do you have benchmarks CUDA12 vs CUDA11 in Linux?

Stats at my repo shows only 3% Linux users...

Qubitium commented 10 months ago

Do you have benchmarks CUDA12 vs CUDA11 in Linux?

Stats at my repo shows only 3% Linux users...

Nope. FYI, no one uses windows to run serious ai training or hosting ai api inference. That tells you where Nvidia's cuda optimization platform priorities are. This allow applies to quality assurance and regression testing. There is a lot more priority on Linux cuda stability/regression testing in my view. The driver internals is platform agnostic but testing wise, I bet my money they do more cuda testing on linux.

BBC-Esq commented 10 months ago

Hello, is there any way to actually see the scripts or know more details about the test itself? I don't say this to caste doubt, but simply because there are a wide range of circumstances that could lead to different and it's generally good practice to have multiple people verify the results repeatedly. I myself was testing VRAM usage of a program of mine and the results varied significantly so I had to run multiple tests to try and get average.

Also, @Purfview are you saying that Ctranslate2 and/or faster-whisper shouldn't be making a CUDA 12+ compatible build at all, or just that you want builds that are compatible with CUDA 11+ to hang around for awhile?

Qubitium commented 10 months ago

Frankly we are wayyyyyy off topic. Purfview should start a new topic. I dont want to be thrown into spam prison. If this is not spam, I dont know what is. j/k Merry xmas!

BBC-Esq commented 10 months ago

I simply would like to know if there will be out of the box CUDA 12 support without having to compile from source. I don't know if your comment was directed at me, but I don't believe I'm posting spam. I've gotten more flack from posting things here than any other GitHub I've posted on...

BBC-Esq commented 10 months ago

By the way, just to be clear, even if your comments were not directed at me, I view @Purfview comments as relevant to this discussion and not "spam" either. Both of our comments are relevant to the topic of CUDA 12+ compatibility so...The topic is, whether CUDA 12+ support should be added without having to compile.

skripnik commented 10 months ago

Wow, what a toxic environment here! BBC-Esq, this request about CUDA 12 is legit, it's not a "spam".

Purfview commented 10 months ago

Wow, what a toxic environment here! BBC-Esq, this request about CUDA 12 is legit, it's not a "spam".

Obviously it's spam. And BBC-Esq is known toxic troll & spammer.

skripnik commented 10 months ago

Calling someone as a "known toxic troll & spammer" does not adhere to the GitHub Community Code of Conduct:

Be respectful - Working in a collaborative environment means disagreements may happen. But remember to criticize ideas, not people.

BBC-Esq commented 10 months ago

Does anyone have a link to the checks for the CUDA 12+ supported wheels? I can't seem to find where the checks were all passed except upload to Pypi, if I recall correctly? I don't see that a new version of faster-whisper has been bumped yet and was just wondering the status of CUDA 12+ support! Thanks!

Purfview commented 10 months ago

@skripnik Your ideas about things are bad.

BBC-Esq commented 8 months ago

Ctranslate2 just release version 4.0 that now has CUDA 12+ support! I'm wondering what changes, if any, would need to be made to the faster-whisper library and perhaps I can help on the Python side of things!

https://github.com/OpenNMT/CTranslate2/releases/tag/v4.0.0

image