NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.1k stars 14.15k forks source link

Build failure: ollama-rocm #344182

Closed Raroh73 closed 1 month ago

Raroh73 commented 1 month ago

Steps To Reproduce

Steps to reproduce the behavior:

  1. services.ollama = {
     enable = true;
     acceleration = "rocm";
    };
  2. sudo nixos-rebuild --flake . switch

Build log

error: builder for '/nix/store/61lr3bfvwm5nc3gk2adxsglkcwcxb23s-ollama-0.3.11.drv' failed with exit code 1;
       last 10 log lines:
       > ++ grep -v amd64/rocm
       > ++ grep -e rocm -e amdgpu -e libtinfo -e libnuma -e libelf
       > + for dep in $(ldd "${BUILD_DIR}/bin/ollama_llama_server" | grep "=>" | cut -f2 -d= | cut -f2 -d' ' | grep -v "${GOARCH}/rocm${ROCM_VARIANT}" | grep -e rocm -e amdgpu -e libtinfo -e libnuma -e libelf)
       > + cp -a /nix/store/d7wl4hnydqbqc2j1qg29sybpc614wkz8-rocm-path/lib/libhipblas.so.2 /nix/store/d7wl4hnydqbqc2j1qg29sybpc614wkz8-rocm-path/lib/libhipblas.so.2.0 ../../dist/linux-amd64//../linux-amd64-rocm/lib/ollama
       > ++ readlink -f /nix/store/d7wl4hnydqbqc2j1qg29sybpc614wkz8-rocm-path/lib/libhipblas.so.2
       > + '[' /nix/store/2c04lrnax0x0jcdrdins3wykm1lb1360-hipblas-6.0.2/lib/libhipblas.so.2.0 '!=' /nix/store/d7wl4hnydqbqc2j1qg29sybpc614wkz8-rocm-path/lib/libhipblas.so.2 ']'
       > ++ readlink -f /nix/store/d7wl4hnydqbqc2j1qg29sybpc614wkz8-rocm-path/lib/libhipblas.so.2
       > + cp /nix/store/2c04lrnax0x0jcdrdins3wykm1lb1360-hipblas-6.0.2/lib/libhipblas.so.2.0 ../../dist/linux-amd64//../linux-amd64-rocm/lib/ollama
       > cp: '/nix/store/2c04lrnax0x0jcdrdins3wykm1lb1360-hipblas-6.0.2/lib/libhipblas.so.2.0' and '../../dist/linux-amd64//../linux-amd64-rocm/lib/ollama/libhipblas.so.2.0' are the same file
       > llm/generate/generate_linux.go:3: running "bash": exit status 1
       For full logs, run 'nix log /nix/store/61lr3bfvwm5nc3gk2adxsglkcwcxb23s-ollama-0.3.11.drv'.
error: 1 dependencies of derivation '/nix/store/rma2ywdlziqa54mhajripg0c9qzrl6ps-system-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/f2nwpv9inxshay6hjdnnpibid6abay6r-unit-ollama.service.drv' failed to build
error: 1 dependencies of derivation '/nix/store/g7w4a13b3j7q4yf5jxxgnd05wdhs2x9d-nixos-system-earth-24.11.20240921.9357f4f.drv' failed to build

Additional context

Notify maintainers

@abysssol @dit7ya @elohmeier @RoyDubnium

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.11.0, NixOS, 24.11 (Vicuna), 24.11.20240919.c04d565`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.5`
 - nixpkgs: `/nix/store/hiasfhl8f5yy88hcfbr3s8s4bm63wsjw-source`

Add a :+1: reaction to issues you find important.

SigmaSquadron commented 1 month ago

Hydra can reproduce: https://hydra.nixos.org/build/273422571

adamcstephens commented 1 month ago

I tracked down the potential source of this failure, and am working on a fix. These builds take 30 minutes for me though, so it's slow going.

adamcstephens commented 1 month ago

Please see #344236 for a fix. Testing appreciated.

deftdawg commented 1 month ago

Thanks for the quick patch.

EDIT: My way of installing to profile didn't pull the fix, I'll see if I can figure out a way to do it properly using the pr review package

deftdawg commented 1 month ago

Builds and runs... seg faults as soon as I load a model, but that's an AMD driver thing (whole WM crashes)... Ship it :laughing:

cd ~/source/nixpkgs
nix run 'nixpkgs#nixpkgs-review' -- pr 344236
## Like 2 hours later

Link to currently reviewing PR:
https://github.com/NixOS/nixpkgs/pull/344236

5 packages built:
alpaca chatd ollama ollama-cuda ollama-rocm

$ /nix/store/mjlq2xzqsjl5pdv78x6zvzxqyf5bs40v-nix-2.18.7/bin/nix-shell --argstr system x86_64-linux --argstr nixpkgs-path /home/deftdawg/.cache/nixpkgs-review/pr-344236/nixpkgs --argstr nixpkgs-config-path /tmp/tmp6pv7u23r.nix --argstr attrs-path /home/deftdawg/.cache/nixpkgs-review/pr-344236/attrs.nix --nix-path 'nixpkgs=/home/deftdawg/.cache/nixpkgs-review/pr-344236/nixpkgs nixpkgs-overlays=/tmp/tmpcgi4dngf' /nix/store/rw4fbjmmhxzydm8lr3fk2s4zx0f7cflj-nixpkgs-review-2.10.5/lib/python3.12/site-packages/nixpkgs_review/nix/review-shell.nix

[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ which ollama
/nix/store/j64jy41s9kcdw1xrilspkp6bvla2nlfd-ollama-0.3.11/bin/ollama

[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ nix-shell -p ollama-rocm

[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ ollama --version
Warning: could not connect to a running Ollama instance
Warning: client version is 0.3.11

[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ which ollama
/nix/store/32yqzr1i1xchxandj5czgzrsaalixs5b-ollama-0.3.11/bin/ollama

[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ HSA_OVERRIDE_GFX_VERSION="11.0.0" ollama serve
2024/09/24 14:29:28 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/deftdawg/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-09-24T14:29:28.352-04:00 level=INFO source=images.go:753 msg="total blobs: 19"
time=2024-09-24T14:29:28.353-04:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-09-24T14:29:28.353-04:00 level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11434 (version 0.3.11)"
time=2024-09-24T14:29:28.389-04:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama2828138276/runners
time=2024-09-24T14:29:29.808-04:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm cpu cpu_avx cpu_avx2]"
time=2024-09-24T14:29:29.808-04:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=amd_linux.go:60 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-09-24T14:29:29.809-04:00 level=INFO source=amd_linux.go:349 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2024-09-24T14:29:29.809-04:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="12.9 GiB"
adamcstephens commented 1 month ago

I did see some segfaults, but was able to restart and get it to load on my 6700xt. “Glad” to know it’s known and common. 😂

rjpcasalino commented 1 month ago

Nice! Thanks for this - excited to see it roll into unstable soon. Ollama seems to always have some issue - is there a "working group" that has some focus on this program and nix? Anyway, thanks again!

deftdawg commented 1 month ago

The problems with crashing aren't nix problems, they're crappy amd driver issues. It doesn't crash if you don't load models close to the max vram available... obviously it shouldn't crash if you do, but amd... so yeah 😄