abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.17k stars 974 forks source link

Specify GPU Selection (e.g., CUDA:0, CUDA:1) #1816

Open RakshitAralimatti opened 3 weeks ago

RakshitAralimatti commented 3 weeks ago

Hi,

Is there a way to specify which GPU to use for inference, such as restricting it to only cuda:0 or cuda:1 in the code? Or are there any workarounds for achieving this?

Thanks in advance.

ExtReMLapin commented 3 weeks ago

You can use tensor split [1,0,0] to ignore cuda 1 and 2 and keep on 0.

Also use split mode none to increase perf if it stays on only one gpu

RakshitAralimatti commented 3 weeks ago

Hi @ExtReMLapin Thanks for your reply! I tried the way you said but got stuck so can you please elaborate in more detail way

ExtReMLapin commented 3 weeks ago

It’s the Llama class arguments

RakshitAralimatti commented 2 weeks ago

@ExtReMLapin Got it Thanks