Open BBC-Esq opened 11 months ago
cublas & cudnn are requirements to use faster whisper
if u want cuda 12 u can build ctranslate2 from source
also avoid spamming issues all over the place
I'm sorry, how did I spam "everywhere?" My suggestion was regarding having it supported out-of-the-box without having to compile, is an inappropriate suggestion? I changed the title of the issue, if that's more appropriate.
@BBC-Esq Yes, you are spamming all over the place. STOP IT!
It's not even faster-whisper issue...
How am I spamming everywhere? I posted one issue in ctranslate2 and one in faster-whisper. I like ctranslate2/faster-whisper a lot and respect the people who work on it, but I'm getting to much flak. Thanks.
the suggestion is not inappropriate, it's legit request, even me im struggling to build ctranslate2
but the way u open multiple issues is inappropriate, u already join the open issue about cuda 12
the suggestion is not inappropriate, it's legit request, even me im struggling to build ctranslate2
but the way u open multiple issues is inappropriate, u already join the open issue about cuda 12
Thanks for the response. I'll admit github is confusing to me and the structure of opening issues (let along pull requests). Can you help me understand where I opened multiple issues and how to delete excessive issues? As far as I know I did one issue on ctranslate2 and one on faster-whisper. As a noob I'm conscious of managing the number of "issues" on my own github so...
there's an open issue about cuda 12, u already joined in, but u open more issues
a way of respecting the devs is avoid sending them multiple notifications about the same thing they're working on
u cannot delete issues u opened, only repo owner can delete, meanwhile u can close issues
Aha! Thanks, I'll find the open issue for CUDA 12 and close this out out then!
There you are already posting in the right place -> https://github.com/OpenNMT/CTranslate2/issues/1250
As far as I know I did one issue on ctranslate2 and one on faster-whisper.
Both are spam. One was closed -> https://github.com/OpenNMT/CTranslate2/issues/1563
Yep! I see that now. Neither are spam. One was a mistake and was rightfully closed by the admin. Here's my last comment on the topic.
The post above is spam too, what's the point of you posting screenshots of your spam, we already know where you spammed.
If the admin of this repository instructs me to stop posting things like what I did, I will. Otherwise, please stop messaging me through this forum. I'm trying to be constructive on this library, which I have a lot of respect for. Goodbye.
Yep! I see that now. Neither are spam. One was a mistake...
That's trolling now.
Otherwise, please stop messaging me through this forum.
I'll if God instructs me to. 😆
@Purfview Your responses are inapposite to an end user. Of course it is a faster-whisper issue because it depends on ctranslate2.
Anyways, ctranslate2 is dragging it's feet on cuda12.x support, you have to built it yourself if you want to remove the errors. This is not a trivial task for an end-user that is not versed in python env and cuda dependencies.
@Qubitium No, it's no the issue of this repo because it can't be fixed here. With your logic you can post it on the Python forums, because it's the issue when you use Python. 😉
Sad news, the tests shows that "Faster-Whisper CUDA v12" has -10% drop in performance, so, stay with CUDA v11.
RTX 3050 GPU:
float16: -10% drop in speed
bfloat16: -8% drop in speed
int8_bfloat16: 0% same
@Purfview Which specific v12 and nvidia driver? also platform linux or windows? There were some early v12 cuda/driver esp with win11 due to vram swapping to cpu ram that dropped perf but not sure of latest.
@Qubitium I think "546.33" and other stuff is currently latest official versions. On Windows.
Check to disable nvidias "virtual" vram gpu feature they introduced in 12.x in windows which auto swap vram to host ram. Lots of users got caught with this killing perf.
Check to disable nvidias "virtual" vram gpu feature they introduced in 12.x in windows which auto swap vram to host ram. Lots of users got caught with this killing perf.
Thx for info. Looks like it's called "CUDA Sysmem Fallback", disabling it didn't had practical influence on the results. But there was a hyper setting found - > "Hardware-accelerated GPU scheduling", it should be ON for performance.
Diff from the tests in a new environment:
Various OS optimizations. Actual CUDA12 install to system. CUDA Sysmem Fallback: OFF Hardware-accelerated GPU scheduling: ON
float16: -1% drop in speed
bfloat16: -5% drop in speed
int8_float16: -21% drop in speed
float16: -1% drop in speed bfloat16: -5% drop in speed int8_float16: -21% drop in speed
The float16 data showing there it's within the margin of error but not bf16 or int8_float16. My suggestion, use windows as your desktop, do everything cuda related under native (not virtualized) linux to get 100% speed.
Do you have benchmarks CUDA12 vs CUDA11 in Linux?
Stats at my repo shows only 3% Linux users...
Do you have benchmarks CUDA12 vs CUDA11 in Linux?
Stats at my repo shows only 3% Linux users...
Nope. FYI, no one uses windows to run serious ai training or hosting ai api inference. That tells you where Nvidia's cuda optimization platform priorities are. This allow applies to quality assurance and regression testing. There is a lot more priority on Linux cuda stability/regression testing in my view. The driver internals is platform agnostic but testing wise, I bet my money they do more cuda testing on linux.
Hello, is there any way to actually see the scripts or know more details about the test itself? I don't say this to caste doubt, but simply because there are a wide range of circumstances that could lead to different and it's generally good practice to have multiple people verify the results repeatedly. I myself was testing VRAM usage of a program of mine and the results varied significantly so I had to run multiple tests to try and get average.
Also, @Purfview are you saying that Ctranslate2 and/or faster-whisper shouldn't be making a CUDA 12+ compatible build at all, or just that you want builds that are compatible with CUDA 11+ to hang around for awhile?
Frankly we are wayyyyyy off topic. Purfview should start a new topic. I dont want to be thrown into spam prison. If this is not spam, I dont know what is. j/k Merry xmas!
I simply would like to know if there will be out of the box CUDA 12 support without having to compile from source. I don't know if your comment was directed at me, but I don't believe I'm posting spam. I've gotten more flack from posting things here than any other GitHub I've posted on...
By the way, just to be clear, even if your comments were not directed at me, I view @Purfview comments as relevant to this discussion and not "spam" either. Both of our comments are relevant to the topic of CUDA 12+ compatibility so...The topic is, whether CUDA 12+ support should be added without having to compile.
Wow, what a toxic environment here! BBC-Esq, this request about CUDA 12 is legit, it's not a "spam".
Wow, what a toxic environment here! BBC-Esq, this request about CUDA 12 is legit, it's not a "spam".
Obviously it's spam. And BBC-Esq is known toxic troll & spammer.
Calling someone as a "known toxic troll & spammer" does not adhere to the GitHub Community Code of Conduct:
Be respectful - Working in a collaborative environment means disagreements may happen. But remember to criticize ideas, not people.
Does anyone have a link to the checks for the CUDA 12+ supported wheels? I can't seem to find where the checks were all passed except upload to Pypi, if I recall correctly? I don't see that a new version of faster-whisper has been bumped yet and was just wondering the status of CUDA 12+ support! Thanks!
@skripnik Your ideas about things are bad.
Ctranslate2 just release version 4.0 that now has CUDA 12+ support! I'm wondering what changes, if any, would need to be made to the faster-whisper library and perhaps I can help on the Python side of things!
Is CUDA 12.1 support coming or in the works? Just curious since faster-whisper keeps looking for cublas11.dll...and although I don't use cudnn, I'm assuming that would be another aspect to consider? Thanks.