Open vvdb-architecture opened 9 months ago
It seems that the CPU back-end and the Cuda back-ends can't be installed at the same time.
If this is by design, the issue can be closed, but since I don't know if this is by design, I'll leave the issue open for others to comment.
Originally they weren't meant to be installed together, since it then wasn't clear which binaries should be used. However we now have the runtime detection which should probe your system and load the best binaries possible.
In your case that looks like it is working, since the logs say:
[LLamaSharp Native] [Info] ./runtimes/win-x64/native/cuda12/libllama.dll is selected and loaded successfully.
But then for some reason that isn't actually using your GPU! I think this probably is a real bug
Hi @vvdb-architecture , if it is not using your GPU even with the Cuda backend do you have your GpuLayerCount in your ModelParams set to -1, or 1-33? If it is not set or set to 0 it will default to cpu-only, even with just the Cuda backend installed. Sorry if I misunderstand your problem, but this may help other users if they have that issue:
Hi @vvdb-architecture , if it is not using your GPU even with the Cuda backend do you have your GpuLayerCount in your ModelParams set to -1, or 1-33? If it is not set or set to 0 it will default to cpu-only, even with just the Cuda backend installed. Sorry if I misunderstand your problem, but this may help other users if they have that issue:
It's set to 33.
I think this issue can be closed, since the docs explicitly states you can only install one of the back-ends.
@vvdb-architecture Sorry for seeing this issue late. It should be my duty to resolve this problem because I wrote the main part of dynamic loading of native library. #588 is also a duplication of this issue.
since the docs explicitly states you can only install one of the back-ends
Yes, but the document has been outdated for a long time. The document still stays at v0.5.0 now, while we are already proceeding to v0.11.0. In document I declared this state because dynamic loading is not supported in v0.5.0.
LLamaSharp is expected to work with multiple backend packages in current version, so I'll re-open this issue and dig on it. Thank you for your reminder in #589!
Hello, is there any news on this issue?
I encounter a similar issue. I have installed both LLamaSharp.Backend.CPU
and LLamaSharp.Backend.Cuda12.Windows
(0.18.0 versions). Following th README I added the following line to show show which native library file is loaded
NativeLibraryConfig.Instance.WithLogCallback(delegate (LLamaLogLevel level, string message) { Console.Write($"{level}: {message}"); } )
When I load a model on the CPU with GpuLayerCount
equals to 0, the cuda backend is loaded
- PreferCuda: True
- PreferVulkan: True
- PreferredAvxLevel: AVX2
- AllowFallback: True
- SkipCheck: False
- SearchDirectories and Priorities: { ./ }
Info: NativeLibraryConfig Description:
- LibraryName: LLama
- Path: ''
- PreferCuda: True
- PreferVulkan: True
- PreferredAvxLevel: AVX2
- AllowFallback: True
- SkipCheck: False
- SearchDirectories and Priorities: { ./ }
Debug: Got relative library path 'runtimes/win-x64/native/cuda12/llama.dll' from local with (NativeLibraryName: LLama, UseCuda: True, UseVulkan: False, AvxLevel: None), trying to load it...
Debug: Found full path file './runtimes/win-x64/native/cuda12/llama.dll' for relative path 'runtimes/win-x64/native/cuda12/llama.dll'
Info: Successfully loaded './runtimes/win-x64/native/cuda12/llama.dll'
When I only install LLamaSharp.Backend.CPU
the correct native library file is loaded
- PreferCuda: True
- PreferVulkan: True
- PreferredAvxLevel: AVX2
- AllowFallback: True
- SkipCheck: False
- SearchDirectories and Priorities: { ./ }
Debug: Got relative library path 'runtimes/win-x64/native/cuda12/llama.dll' from local with (NativeLibraryName: LLama, UseCuda: True, UseVulkan: False, AvxLevel: None), trying to load it...
Debug: Found full path file 'runtimes/win-x64/native/cuda12/llama.dll' for relative path 'runtimes/win-x64/native/cuda12/llama.dll'
Info: Failed Loading 'runtimes/win-x64/native/cuda12/llama.dll'
Debug: Got relative library path 'runtimes/win-x64/native/vulkan/llama.dll' from local with (NativeLibraryName: LLama, UseCuda: False, UseVulkan: True, AvxLevel: None), trying to load it...
Debug: Found full path file 'runtimes/win-x64/native/vulkan/llama.dll' for relative path 'runtimes/win-x64/native/vulkan/llama.dll'
Info: Failed Loading 'runtimes/win-x64/native/vulkan/llama.dll'
Debug: Got relative library path 'runtimes/win-x64/native/avx2/llama.dll' from local with (NativeLibraryName: LLama, UseCuda: False, UseVulkan: False, AvxLevel: Avx2), trying to load it...
Debug: Found full path file './runtimes/win-x64/native/avx2/llama.dll' for relative path 'runtimes/win-x64/native/avx2/llama.dll'
Info: Successfully loaded './runtimes/win-x64/native/avx2/llama.dll'
When I load a model on the CPU with GpuLayerCount equals to 0, the cuda backend is loaded
That's how it's meant to work - if the CUDA binaries are available and compatible with your system they will be used unless you explicitly disable CUDA at load time with NativeLibraryConfig.All.WithCuda(false)
.
Changing GpuLayerCount
changes how many layers are sent to the GPU, but does not change which backend is used. Setting it to zero should be equivalent to not using CUDA at all (although possibly slightly slower than the pure CPU binaries).
I'm using Kernel-memory with LLamaSharp. Despite having a RTX 3080 and the latest CUDA drivers installed, CUDA is not used.
Not sure if this is a bug or I'm missing something, so here's a question instead:
The LlamaSharp.csproj contains
I found out that if both Cpu and Cuda12 back-ends are referenced, only the CPU is being used even if the CUDA DLL is loaded. Interestingly, the logs do say that the CUDA back-end is loaded, but no Cuda is used.
If I remove the reference to LLamaSharp.Backend.Cpu, then the CUDA back-end will start to be used. The logs show:
I've reported this to the kernel memory project, but was advised to report this here.