Open JonBoyleCoding opened 9 months ago
Sorry, but I don't have much idea about what's going wrong. It seems likely that ollama simply doesn't support that gpu currently. This nvml vram init failure: 18
is a similar message to what I get on my amd gpu when built with cuda nvml vram init failure: 9
, so I can only guess that ollama isn't set up to recognize/use your gpu, or that libnvidia-ml
isn't properly detecting it.
It is also possible that an error could be caused due to building with incompatible library versions, or a missing library that should be exposed to ollama at runtime. Unfortunately, my knowledge of ollama, go, c++, etc are too superficial for me to understand what the likely cause is.
I did encounter an issue that seems superficially related, and I opened an issue with ollama. It's probably actually unrelated, but maybe it would be of interest?
Ultimately, I would recommend opening an issue with ollama, since the maintainers there would hopefully know better about what's going wrong (even if it is my nix package that's actually at fault).
I'm still working on a flake from prior to your changes to wrap around your nixpkgs fork (as of this moment, that doesn't work for me).
Would you be willing to open an issue about this with more detail, so I can try to fix it? Does it not build, or does it build but not detect any gpu?
I'll open an issue with ollama then. Thanks for your thoughts. I'll try and build the new version again and get back to you in a separate issue.
Just to note @abysssol, I tried again your most up-to-date version. I realised what the issue was - I was having nixpkgs follow the unstable branch of nixpkgs (which at the moment you have pointing to your separate repository before the PR).
It's still building at the moment, but I imagine there won't be any issues now! Will let you know if it fails any further.
Sorry to re-open this issue.
I noticed in your issue that you set the OLLAMA_DEBUG
variable. I thought I'd give it a shot and here's the relevant output.
time=2024-02-14T22:41:55.626Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/nix/store/z6557r7pgvmxr9x16a4ffazly8dflh65-nvidia-x11-545.29.06-6.1.77/lib/libnvidia-ml.so.545.29.06]"
wiring nvidia management library functions in /nix/store/z6557r7pgvmxr9x16a4ffazly8dflh65-nvidia-x11-545.29.06-6.1.77/lib/libnvidia-ml.so.545.29.06
dlsym: nvmlInit_v2
dlsym: nvmlShutdown
dlsym: nvmlDeviceGetHandleByIndex
dlsym: nvmlDeviceGetMemoryInfo
dlsym: nvmlDeviceGetCount_v2
dlsym: nvmlDeviceGetCudaComputeCapability
dlsym: nvmlSystemGetDriverVersion
dlsym: nvmlDeviceGetName
dlsym: nvmlDeviceGetSerial
dlsym: nvmlDeviceGetVbiosVersion
dlsym: nvmlDeviceGetBoardPartNumber
dlsym: nvmlDeviceGetBrand
nvmlInit_v2 err: 18
time=2024-02-14T22:41:55.630Z level=INFO source=gpu.go:300 msg="Unable to load CUDA management library /nix/store/z6557r7pgvmxr9x16a4ffazly8dflh65-nvidia-x11-545.29.06-6.1.77/lib/libnvidia-ml.so.545.29.06: nvml vram init failure: 18"
This shows that the error is specifically coming from nvmlInit_v2
and the following page states the following:
NVML_ERROR_LIB_RM_VERSION_MISMATCH = 18
RM detects a driver/library version mismatch.
Investigating further, the drivers I have in the /run/opengl-driver/lib
directory are 545.29.02, whereas above you can see it's 545.29.06. It's only a minor version different, but perhaps this was where it was coming from.
This is probably an issue that will be solved once everything ends up in nixpkgs, but I thought I'd try and dive into the rabbit hole and see if I was correct. Unfortunately I've hit one too many stumbling blocks and it's getting late where I am. Detailing where I am at the moment.
So it seems my issue COULD be coming from a mismatch in that my system is on nixpkgs-stable and yours is currently based on nixpkgs-master. What's weird though is that I've never had this issue with other deep learning libraries in the past where I base my flakes on nixpkgs-unstable.
I've tried overriding a bit and getting stuck (pkgs referring to nixpkgs-stable):
let
ollama = ollama-abysssol.cuda.override {
cudaGcc = pkgs.gcc11;
cudaPackages = pkgs.cudaPackages;
linuxPackages = pkgs.linuxPackages;
};
in
Ends with:
> + g++ -fPIC -g -shared -o ../llama.cpp/build/linux/x86_64/cuda_v11/lib/libext_server.so -Wl,--whole-archive ../llama.cpp/build/linux/x86_64/cuda_v11/examples/server/libext_server.a -Wl,--no-whole-archive ../llama.cpp/build/linux/x86_64/cuda_v11/common/libcommon.a ../llama.cpp/build/linux/x86_64/cuda_v11/libllama.a '-Wl,-rpath,$ORIGIN' -lpthread -ldl -lm -L/nix/store/z23gdb356jkbf3nl91c0mk4al1dl81pr-cuda-toolkit/lib -lcudart -lcublas -lcublasLt -lcuda
> /nix/store/idiaraknw071d20nlqp49s18gbvw4wa0-binutils-2.40/bin/ld: cannot find -lcuda: No such file or directory
> collect2: error: ld returned 1 exit status
And it's certainly right - there is no libcuda in that cuda-toolkit directory, so I'm guessing there has been a change in how cuda organises it's libraries.
If you have any thoughts on how to go about testing this I'd appreciate it - but no worries if not. Ultimately I seem to get stuck diving into rabbit holes like this ;)
Sorry to re-open this issue.
Don't be; this is exactly what reopening issues is meant for. I'm actually rather excited to see that this problem may have a solution after all. It's also getting late where I am though, so I'll take a closer look tomorrow.
Sorry for not getting to this today, I've been working on getting ollama 0.1.24 merged into upstream nixpkgs.
My hope is that once ollama is available from upstream nixpkgs, all library and driver versions should match, so hopefully your gpu would work then. I changed back to vendoring the ollama module instead of using it from my nixpkgs fork, so you could override nixpkgs with your stable version again.
ollama = {
url = "github:abysssol/ollama-flake";
inputs.nixpkgs.follows = "nixpkgs";
};
Then, try overriding linuxPackages
with whatever kernel you use. That's where libnvidia-ml
comes from, maybe different kernel versions use different lib versions.
let
ollama = ollama-abysssol.cuda.override {
# if you use the zen kernel, else the relevent kernel packages. omit if using default kernel
linuxPackages = pkgs.linuxPackages_zen;
};
in
Maybe that could make a difference? It looks like you already did something similar, so maybe not ... I don't know.
By the way, libcuda
is from cudaPackages.cuda_cudart
. Maybe it's not there on stable?
Could you also post a file tree of /run/opengl-driver/lib/
? ie with tree /run/opengl-driver/lib/
or exa --tree /run/opengl-driver/lib/
. /sys/module/nvidia/
may also have have something useful.
I think I'll have more time tomorrow to try to figure this out. For Real This Time™
I created a new branch, cuda-testing
, that changes how it splices together the cuda-toolkit
libs. Could you try building that and see if it makes a difference?
ollama = {
url = "github:abysssol/ollama-flake/cuda-testing";
inputs.nixpkgs.follows = "nixpkgs";
};
Unfortunately, I have thus far been unable to find any more information than what you already did.
Apologies for not responding - I have a number off reports/papers I'm working on at the moment!
Just had a chance to try. It appears the patches fail - going into a meeting now so I cannot debug. But I can leave you with a log for now.
Running phase: patchPhase
applying patch /nix/store/liqb6g8spk497dz0bsxlp4bmadr4189c-remove-git.patch
patching file llm/generate/gen_common.sh
applying patch /nix/store/53fs5wbc3lq27pkcdhg65q9gkf0z8g88-replace-gcc.patch
patching file llm/generate/gen_common.sh
Hunk #1 succeeded at 89 (offset 3 lines).
applying patch /nix/store/65f7ahf1i5m7d1j6l6is50aq93snl0ac-01-cache.diff
patching file llm/llama.cpp/examples/server/server.cpp
applying patch /nix/store/x1jg303zsxd6zzs3k8bkxdn5ykhbh5l3-02-shutdown.diff
patching file llm/llama.cpp/examples/server/server.cpp
Hunk #2 succeeded at 2433 (offset 38 lines).
Hunk #3 succeeded at 3057 (offset 39 lines).
patching file llm/llama.cpp/examples/server/utils.hpp
substituteStream(): ERROR: Invalid command line argument: --replace-fail
/nix/store/i0l5falbdsbfl1lgypdp1jda672bdjw3-stdenv-linux/setup: line 131: pop_var_context: head of shell_variables not a function context
Apologies for not responding
No worries. It's good to prioritize things that actually matter to you. I'll do the same. To put it in perspective, we're just trying to get a chatbot to run a bit faster. Not an especially urgent endeavor.
substituteStream(): ERROR: Invalid command line argument: --replace-fail
The failure was an oversight of mine: I left in an argument to substituteInPlace
that had only been added in unstable nixpkgs.
This is my problem too, I'm redefining nvidia-x11 on my nixos config like this:
hardware.nvidia.package = pkgs.linuxPackages_cachyos.nvidia_x11.overrideAttrs (s: rec {
version = "550.40.07";
name = (builtins.parseDrvName s.name).name + "-" + version;
src = pkgs.fetchurl {
url = "https://download.nvidia.com/XFree86/Linux-x86_64/${version}/NVIDIA-Linux-x86_64-${version}.run";
sha256 = "298936c727b7eefed95bb87eb8d24cfeef1f35fecac864d98e2694d37749a4ad";
};
});
That is != than nixpkgs itself, so ollama fail because build with 545, the default version of nixpkgs. How can I change the version to this one on input definition of ollama.
I'm not sure if this will work, but I think you can just override nvidia_x11 with your custom driver. Try it and tell me if it works.
{ pkgs, lib, config, ollama }: # add `config` if it's not already an argument
let
system = "x86_64-linux";
ollamaCuda = (ollama.${version}.cuda.override {
linuxPackages.nvidia_x11 = config.hardware.nvidia.package;
});
in
{
# if you're using the service in nixos-unstable
services.ollama.package = ollamaCuda;
# otherwise, put it in system packages
environment.systemPackages = [
ollamaCuda
];
}
inputs = {
ollama.url = "github:abysssol/ollama-flake";
};
Humm i think that is the way, but sems that stdenv is not get from cuda
/nix/store/idiaraknw071d20nlqp49s18gbvw4wa0-binutils-2.40/bin/ld: cannot find -lcuda: No such file or directory
Working!!!!!!!!!!! with @abysssol suggestion, only not add the inputs.nixpkgs.follows = "nixpkgs";
I'm using a full flake system, so I have in my own repo my OS config and my home-manager config, so at end:
Main flake.nix, On inputs:
ollama = {
url = "github:abysssol/ollama-flake";
inputs.utils.follows = "flake-utils";
};
later, declare an overlay for this ollama:
overlay-ia = final: prev: {
ia = {
ollama = ollama.packages.${system}.cuda.override {
linuxPackages.nvidia_x11 = pkgs.linuxPackages_cachyos.nvidia_x11.overrideAttrs (s: rec {
version = "550.40.07";
name = (builtins.parseDrvName s.name).name + "-" + version;
src = pkgs.fetchurl {
url = "https://download.nvidia.com/XFree86/Linux-x86_64/${version}/NVIDIA-Linux-x86_64-${version}.run";
sha256 = "298936c727b7eefed95bb87eb8d24cfeef1f35fecac864d98e2694d37749a4ad";
};
});
};
};
};
of course, this nvidia package must be the same declared on my os config:
hardware.nvidia.package = pkgs.linuxPackages_cachyos.nvidia_x11.overrideAttrs (s: rec {
version = "550.40.07";
name = (builtins.parseDrvName s.name).name + "-" + version;
src = pkgs.fetchurl {
url = "https://download.nvidia.com/XFree86/Linux-x86_64/${version}/NVIDIA-Linux-x86_64-${version}.run";
sha256 = "298936c727b7eefed95bb87eb8d24cfeef1f35fecac864d98e2694d37749a4ad";
};
});
Then, add this overlay to my nixpkgs definition, later use it on home-manager packages definitions. And wallaaaaa:
time=2024-02-24T18:36:23.272-03:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.6"
thanks @abysssol
I'm glad to hear you got it working.
Is there a reason why you're duplicating the definition of your custom driver instead of using a let binding? It seems like it could cause issues if they ever get out of sync.
I'm glad to hear you got it working.
Is there a reason why you're duplicating the definition of your custom driver instead of using a let binding? It seems like it could cause issues if they ever get out of sync.
Nonono no reason, heheheh, that's why I was testing, now I'm using a common let on the main flake to define the desired nvidia driver.
Hi - I just wondered if you had some thoughts. I have a machine with some NVIDIA 2080 Supers in that, for some reason, doesn't detect the GPU and launches in CPU only mode. Happy to go over to Ollama directly if you're not sure. However thought that maybe you might have come across this so worth asking first. Thanks for looking at this either way.
I'm still working on a flake from prior to your changes to wrap around your nixpkgs fork (as of this moment, that doesn't work for me). Tried using the gpu/cuda package. I've used this on another machine and it works flawlessly (thanks for your hard work on this!).
I noticed I have this here:
To clarify I'm able to run pytorch with CUDA on a GPU from within a flake, so I believe the system is setup correctly. Again, happy to open an issue with Ollama if you don't believe you can help.
Full log of opening
ollama serve
.