How to accelerate inference?

Maknee / minigpt4.cpp

Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)

MIT License

549 stars 26 forks source link

How to accelerate inference? #16

Open dengtianbi opened 10 months ago

dengtianbi commented 10 months ago

Hi,

I enabled the cublas compilation option.

The problem is that not charge o process all in GRAM memory?

What is the best line command to construct and execute in a CUDA 3090 with 24GB GRAM in the more fast posibility for each model?

Maknee commented 10 months ago

Take a look at #15. Minigpt4 model is composed of two models (vision and text). The vision model does not support GPU usage, but the text model (vicuna) does.

Try enabling LLAMA_CUBLAS and see if you can run part of the model on the GPU. I haven't tested these flags before, but I would assume that they would work.

deadpipe commented 8 months ago

@Maknee

I tried setting option(MINIGPT4_CUBLAS "minigpt4: use cuBLAS" ON) in the CMakeLists.txt.

But when i run cmake --build . --config Release,

i get this error below unfortunately : -

C__Windows_System32_cmd exe 24_11_2023 00_27_05

Any advice to deal with is highly appreciated