ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.75k stars 3.64k forks source link

Can't disable gpu #1762

Open thewh1teagle opened 10 months ago

thewh1teagle commented 10 months ago

whisper_context_params.use_gpu = false; doesn't work. it still trying to use opencl and leads to crash (specific in my case with opencl)

I use it in my project vibe And this option very important because I want to give users the best possible speed with GPU but fallback in case of error.

bobqianic commented 10 months ago

Hi @slaren , is there a way to completely turn off OpenCL during runtime? Thanks!

ggerganov commented 10 months ago

Currently, there is no way to disable the GPU completely when the project is built with OpenCL support. Will think about fixing this.

In the meantime, does the information from https://github.com/ggerganov/whisper.cpp/issues/888 help in anyway?

thewh1teagle commented 10 months ago

@ggerganov It doesn't help. currenly I use openBlas so at least the performance is much better than without. Looking to improve it with the project vibe to get the best possible

chuck-fyn commented 10 months ago

@ggerganov I am also trying to turn off GPU use to allow for background processing on the iphone. Apologies if this is obvious, but is it possible for me to turn off the OpenCL support so that I can turn off the GPU use?

ggerganov commented 10 months ago

You can easily update ggml.c to avoid all GPU calls (CUDA, OpenCL, etc.) if a global flag is set. For example here:

https://github.com/ggerganov/whisper.cpp/blob/1f50a7d29f85f221368e81201780e0c8dd631076/ggml.c#L9816-L9825

You can add a void ggml_gpu_set(bool enable); call that sets a global boolean flag and check the flag before each GPU call in ggml.c.

This is currently not officially supported in ggml because I want to figure out a better API. But for quick workaround, I think this is the only option atm.

thewh1teagle commented 10 months ago

@ggerganov I think that eventually it will be useful having is_avaibale() function for each gpu method (cuda, coreml etc)

thewh1teagle commented 10 months ago

@ggerganov Can we somehow get is_available() functions per each GPU platform? so we can easily decide which to use? I just added coreml support for vibe app and the performance incredible. (20x faster and even more)

Also about the option for disable gpu using use_gpu = false, do you have any progress / plans about it? I'm eager to add support for GPU for Linux and Windows as well.

WilliamTambellini commented 7 months ago

hi, same issue on linux with a cuda build: still seems to init and use the cuda gpu despite the '-ng' cli argument:

./cuda/main -m ggml-base.en.bin -f samples/jfk.wav -ng
...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Quadro RTX 3000, compute capability 7.5, VMM: yes
...

best

slaren commented 7 months ago

@WilliamTambellini this no longer happens with the CUDA backend after the sync with ggml from yesterday.

WilliamTambellini commented 7 months ago

Tks @slaren Superb, I will pull and rebuild and retest. Congrats.

WilliamTambellini commented 7 months ago

Tks @slaren @ggerganov 1.5.4 is already few months old, from Jan 5th. Would you mind doing a new release? Best

ggerganov commented 7 months ago

I'll probably make a new one soon, yes

thewh1teagle commented 6 months ago

Is there an update regrading this feature? Also, I can see that ggml.c check if the cpu has avx2 / cuda etc at compile time rather than at runtime using __builtin_cpu_supports("...") That results in crashing for instance if the cpu doesn't support avx2 while instead we can create assertion for all of that with better errors and maybe choose what gpu platform to use (not sure if possible) for instance I would like to use by default CLBLast but if Cuda is available then use it on Windows.