NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.95k stars 13.96k forks source link

Nixos - ollama - NVIDIA GeForce RTX 3060 Cuda not working #326913

Closed asosnovsky closed 3 months ago

asosnovsky commented 3 months ago

Describe the bug

Bought this GPU specifically because it was listed at the top of the recommended hardware on ollama Tried to enable ollama as well the overall nvidia support, but nothing seems to be picking up the device.

Here is my config: https://github.com/asosnovsky/nixos-setup/blob/main/hosts/hl-bigbox1.nix

Here is my device info

Gigabyte Technology Co., Ltd. B550 GAMING X V2
│
├─AMD Ryzen 5 3600 6-Core Processor:
│ │   Device ID:          4bde70ba4e39b28f9eab1628f9dd6e6244c03027
│ │   Current version:    0x08701030
│ │   Vendor:             Advanced Micro Devices, Inc.
│ │   GUIDs:              b54f3c89-8a41-5b9f-9a5c-0367c19a9451 ← CPUID\PRO_0&FAM_17&MOD_71
│ │                       d951a23c-a920-50ac-ad89-0660aa0be95c ← CPUID\PRO_0&FAM_17&MOD_71&STP_0
│ │   Device Flags:       • Internal device
│ │
├─GA106 [GeForce RTX 3060 Lite Hash Rate]:
│     Device ID:          08740947f5235290dc47990eb8e3468dad7fe6b8
│     Current version:    a1
│     Vendor:             NVIDIA Corporation (PCI:0x10DE, PCI:0x1022)
│     GUIDs:              2954a00b-f08f-56ad-81f7-7e22bedfdfc7 ← PCI\VEN_10DE&DEV_2504
│                         533f2b03-1243-530c-a168-9706751fc035 ← PCI\VEN_10DE&DEV_2504&SUBSYS_1462397D
│                         2d530482-50db-5cbc-97f7-6ae02507276d ← PCI\VEN_1022&DEV_1483
│                         0b05f0c1-7f0b-59e0-adb4-0e58b9b6e7e3 ← PCI\VEN_1022&DEV_1483&SUBSYS_10221453
│     Device Flags:       • Internal device
│                         • Cryptographic hash verification is available
│
├─TPM:
│     Device ID:          c6a80ac3a22083423992a3cb15018989f37834d6
│     Summary:            TPM 2.0 Device
│     Current version:    3.87.0.5
│     Vendor:             Advanced Micro Devices, Inc. (TPM:AMD)
│     GUIDs:              9305de1c-1e12-5665-81c4-37f8e51219b8 ← TPM\VEN_AMD&DEV_0001
│                         78a291ae-b499-5b0f-8f1d-74e1fefd0b1c ← TPM\VEN_AMD&MOD_AMD
│                         65a3fced-b423-563f-8098-bf5c329fc063 ← TPM\VEN_AMD&DEV_0001&VER_2.0
│                         5e704f0d-83cb-5364-8384-f46d725a23b8 ← TPM\VEN_AMD&MOD_AMD&VER_2.0
│     Device Flags:       • Internal device
│                         • System requires external power source
│                         • Needs a reboot after installation
│                         • Device can recover flash failures
│                         • Full disk encryption secrets may be invalidated when updating
│                         • Signed Payload
│
└─WD BLACK SN770 250GB:
      Device ID:          71b677ca0f1bc2c5b804fa1d59e52064ce589293
      Summary:            NVM Express solid state drive
      Current version:    731030WD
      Vendor:             Sandisk Corp (NVME:0x15B7)
      Serial Number:      22196Y802128
      GUIDs:              1524d43d-ed91-5130-8cb6-8b8478508bae ← NVME\VEN_15B7&DEV_5017
                          87cfda90-ce08-52c3-9bb5-0e0718b7e57e ← NVME\VEN_15B7&DEV_5017&SUBSYS_15B75017
                          36698e2e-ae33-573d-a6c0-798393b361ef ← WD_BLACK SN770 250GB
      Device Flags:       • Internal device
                          • Updatable
                          • System requires external power source
                          • Needs a reboot after installation
                          • Device is usable for the duration of the update

Here is my nix info

Steps To Reproduce

  services.ollama = {
    enable = true;
    acceleration = "cuda";
    listenAddress = "0.0.0.0:11434";
    environmentVariables = {
      OLLAMA_LLM_LIBRARY = "cuda";
    };
  };

Screenshots

Ollama serve logs from systemctl

time=2024-07-13T16:44:09.180-04:00 level=INFO source=routes.go:1054 msg="Listening on 127.0.0.1:11434 (version 0.1.38)"
time=2024-07-13T16:44:09.180-04:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/run/user/1000/ollama2290230408/runners
time=2024-07-13T16:44:13.920-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v12]"
time=2024-07-13T16:44:13.963-04:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="31.3 GiB" available="5.9 GiB"

and while running ollama run llama3:8b

root@hl-bigbox1:/home/ari/nixos-setup/ > ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL
llama3:8b       365c0bd3c000    4.9 GB  100% CPU        4 minutes from now

Metadata

VeilSilence commented 3 months ago

Jul 14 01:50:26 Nix ollama[1087239]: time=2024-07-14T01:50:26.372+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v12]"
Jul 14 01:50:26 Nix ollama[1087239]: time=2024-07-14T01:50:26.473+03:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-0a730a3d-f915-32f2-19e0-bef38f6809b4 library=cuda compute=8.6 driver=12.5 name="NVIDIA GeForce RTX 3090" total="23.8 GiB" available="22.3 GiB"

Try change settings to something like that:


    environmentVariables = {
      OLLAMA_LLM_LIBRARY = "cuda";
      LD_LIBRARY_PATH = "run/opengl-driver/lib";
    };   
asosnovsky commented 3 months ago

@VeilSilence are you on unstable or nixos-24?

VeilSilence commented 3 months ago

Unstable.

asosnovsky commented 3 months ago

@VeilSilence just switched my configs to use unstable still the same.. :(

mind sharing your configs?

abysssol commented 3 months ago

Maybe you should try setting services.xserver.videoDrivers = ["nvidia"];? It seems to influence services.xserver.drivers, which could potentially be read by other nixos modules even if you don't use xserver.

Have you tried a minimal ollama config with just the following? Maybe host or one of the env vars is actually interfering?

services.ollama = {
  enable = true;
  acceleration = "cuda";
};

Also, if you just want to use unstable ollama I would recommend the following with a base of stable nixos-24.05, where unstable is imported from a separate nixpkgs-unstable flake input:

services.ollama = {
  enable = true;
  acceleration = "cuda";
  package = unstable.ollama;
};

Also, are you aware that your enableNetowrking option is misspelled with the wo swapped to ow? As is homeMangerVersion, with the nag dropping the a to ng.

asosnovsky commented 3 months ago

@abysssol thank you for catching those (embarassing) I figured the issue... I never removed the old gpu I had on this (just removed the power from it). Looks like properly taking it out fixed my issues.

jasper-clarke commented 2 months ago

@asosnovsky @abysssol I have a RTX 3060 aswell and Ollama isn't using my GPU but it is being detected! I don't have any other GPU's and the iGPU of my CPU (Intel i5-12600) is disabled in the BIOS.

My config:

    ollama = {
      enable = true;
      acceleration = "cuda";
      environmentVariables = {
        OLLAMA_LLM_LIBRARY = "cuda";
        LD_LIBRARY_PATH = "run/opengl-driver/lib";
      };
    };

I am getting the same 100% CPU log from Ollama as at the top of this issue,

Journalctl logs:

Jul 10 09:43:05 nixos ollama[1989]: time=2024-07-10T09:43:05.270+10:00 level=INFO source=routes.go:1111 msg="Listening on 127.0.0.1:11434 (version 0.1.47)"
Jul 10 09:43:05 nixos ollama[1989]: time=2024-07-10T09:43:05.271+10:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3450513807/runners
Jul 10 09:43:09 nixos ollama[1989]: time=2024-07-10T09:43:09.081+10:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v12 cpu cpu_avx cpu_avx2]"
Jul 10 09:43:09 nixos ollama[1989]: time=2024-07-10T09:43:09.148+10:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-6814f4a1-a623-257e-fdcf-1da7dbff1e59 library=cuda compute=8.6 driver=12.2 name="NVIDIA GeForce RTX 3060" total="11.8 GiB" available="11.6 GiB"
asosnovsky commented 2 months ago

@asosnovsky

@abysssol

I have a RTX 3060 aswell and Ollama isn't using my GPU but it is being detected!

I don't have any other GPU's and the iGPU of my CPU (Intel i5-12600) is disabled in the BIOS.

My config:


    ollama = {

      enable = true;

      acceleration = "cuda";

      environmentVariables = {

        OLLAMA_LLM_LIBRARY = "cuda";

        LD_LIBRARY_PATH = "run/opengl-driver/lib";

      };

    };

I am getting the same 100% CPU log from Ollama as at the top of this issue,

Journalctl logs:


Jul 10 09:43:05 nixos ollama[1989]: time=2024-07-10T09:43:05.270+10:00 level=INFO source=routes.go:1111 msg="Listening on 127.0.0.1:11434 (version 0.1.47)"

Jul 10 09:43:05 nixos ollama[1989]: time=2024-07-10T09:43:05.271+10:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3450513807/runners

Jul 10 09:43:09 nixos ollama[1989]: time=2024-07-10T09:43:09.081+10:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v12 cpu cpu_avx cpu_avx2]"

Jul 10 09:43:09 nixos ollama[1989]: time=2024-07-10T09:43:09.148+10:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-6814f4a1-a623-257e-fdcf-1da7dbff1e59 library=cuda compute=8.6 driver=12.2 name="NVIDIA GeForce RTX 3060" total="11.8 GiB" available="11.6 GiB"

Honestly after I did another nix flake update about a week ago, this stopped working for me again. I think because ollama is still niche, it could be that the developer working on this has not had the proper time to support it well. I manage to get this working properly by doing a pass through with nvidia to docker (which is more widely used and supported) and then spinning up ollama with an official docker image.

See my container config https://github.com/asosnovsky/nixos-setup/blob/753f21b3fe95b5d0259c37a5186900fd19bfa5c1/hosts/hl-bigbox1.nix#L18

And docket nvidia pass through https://github.com/asosnovsky/nixos-setup/blob/753f21b3fe95b5d0259c37a5186900fd19bfa5c1/hosts/hl-bigbox1.nix#L6

jasper-clarke commented 2 months ago

Thanks @asosnovsky I'll give that docker container setup a try! Edit: Works perfectly thank you, I just had to reinstall my models with docker exec ollama ollama pull my-model

asosnovsky commented 2 months ago

Thanks @asosnovsky

I'll give that docker container setup a try!

Edit:

Works perfectly thank you, I just had to reinstall my models with docker exec ollama ollama pull my-model

You can install the ollama cli via pkgs.ollama and set the OLLAMA_HOST env to point at the container. That way you can manage it with the standard cli tool (or any gui)