Closed Raroh73 closed 1 month ago
Hydra can reproduce: https://hydra.nixos.org/build/273422571
I tracked down the potential source of this failure, and am working on a fix. These builds take 30 minutes for me though, so it's slow going.
Please see #344236 for a fix. Testing appreciated.
Thanks for the quick patch.
EDIT: My way of installing to profile didn't pull the fix, I'll see if I can figure out a way to do it properly using the pr review package
Builds and runs... seg faults as soon as I load a model, but that's an AMD driver thing (whole WM crashes)... Ship it :laughing:
cd ~/source/nixpkgs
nix run 'nixpkgs#nixpkgs-review' -- pr 344236
## Like 2 hours later
Link to currently reviewing PR:
https://github.com/NixOS/nixpkgs/pull/344236
5 packages built:
alpaca chatd ollama ollama-cuda ollama-rocm
$ /nix/store/mjlq2xzqsjl5pdv78x6zvzxqyf5bs40v-nix-2.18.7/bin/nix-shell --argstr system x86_64-linux --argstr nixpkgs-path /home/deftdawg/.cache/nixpkgs-review/pr-344236/nixpkgs --argstr nixpkgs-config-path /tmp/tmp6pv7u23r.nix --argstr attrs-path /home/deftdawg/.cache/nixpkgs-review/pr-344236/attrs.nix --nix-path 'nixpkgs=/home/deftdawg/.cache/nixpkgs-review/pr-344236/nixpkgs nixpkgs-overlays=/tmp/tmpcgi4dngf' /nix/store/rw4fbjmmhxzydm8lr3fk2s4zx0f7cflj-nixpkgs-review-2.10.5/lib/python3.12/site-packages/nixpkgs_review/nix/review-shell.nix
[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ which ollama
/nix/store/j64jy41s9kcdw1xrilspkp6bvla2nlfd-ollama-0.3.11/bin/ollama
[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ nix-shell -p ollama-rocm
[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ ollama --version
Warning: could not connect to a running Ollama instance
Warning: client version is 0.3.11
[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ which ollama
/nix/store/32yqzr1i1xchxandj5czgzrsaalixs5b-ollama-0.3.11/bin/ollama
[nix-shell:~/.cache/nixpkgs-review/pr-344236]$ HSA_OVERRIDE_GFX_VERSION="11.0.0" ollama serve
2024/09/24 14:29:28 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/deftdawg/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-09-24T14:29:28.352-04:00 level=INFO source=images.go:753 msg="total blobs: 19"
time=2024-09-24T14:29:28.353-04:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-09-24T14:29:28.353-04:00 level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11434 (version 0.3.11)"
time=2024-09-24T14:29:28.389-04:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama2828138276/runners
time=2024-09-24T14:29:29.808-04:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[rocm cpu cpu_avx cpu_avx2]"
time=2024-09-24T14:29:29.808-04:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=gpu.go:668 msg="unable to locate gpu dependency libraries"
time=2024-09-24T14:29:29.808-04:00 level=WARN source=amd_linux.go:60 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-09-24T14:29:29.809-04:00 level=INFO source=amd_linux.go:349 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2024-09-24T14:29:29.809-04:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="12.9 GiB"
I did see some segfaults, but was able to restart and get it to load on my 6700xt. “Glad” to know it’s known and common. 😂
Nice! Thanks for this - excited to see it roll into unstable soon. Ollama seems to always have some issue - is there a "working group" that has some focus on this program and nix? Anyway, thanks again!
The problems with crashing aren't nix problems, they're crappy amd driver issues. It doesn't crash if you don't load models close to the max vram available... obviously it shouldn't crash if you do, but amd... so yeah 😄
Steps To Reproduce
Steps to reproduce the behavior:
sudo nixos-rebuild --flake . switch
Build log
Additional context
Notify maintainers
@abysssol @dit7ya @elohmeier @RoyDubnium
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.Add a :+1: reaction to issues you find important.