Open RakshitAralimatti opened 3 weeks ago
You can use tensor split [1,0,0] to ignore cuda 1 and 2 and keep on 0.
Also use split mode none to increase perf if it stays on only one gpu
Hi @ExtReMLapin Thanks for your reply! I tried the way you said but got stuck so can you please elaborate in more detail way
It’s the Llama class arguments
@ExtReMLapin Got it Thanks
Hi,
Is there a way to specify which GPU to use for inference, such as restricting it to only
cuda:0
orcuda:1
in the code? Or are there any workarounds for achieving this?Thanks in advance.