Jeffser / Alpaca

An Ollama client made with GTK4 and Adwaita
https://jeffser.com/alpaca
GNU General Public License v3.0
415 stars 44 forks source link

Detect and inform user about unmet system requirements for running a model #336

Open CodingKoalaGeneral opened 1 month ago

CodingKoalaGeneral commented 1 month ago

Describe the bug Detect and inform user about unmet system requirements for running a model, like the 200GB+ RAM for LLama 3.1 405B

level=WARN source=server.go:136 msg="model request too large for system" requested="209.7 GiB" available=12541681664 total="14.9 GiB" free="8.5 GiB" swap="3.1 GiB"

Expected behavior Currently it just shows the error message "There was an error with the local Ollama instance, so it has been reset" , please give an explanation within the gui

Screenshots image

Debugging information

$ flatpak run com.jeffser.Alpaca 
INFO    [main.py | main] Alpaca version: 2.0.6
INFO    [connection_handler.py | start] Starting Alpaca's Ollama instance...
INFO    [connection_handler.py | start] Started Alpaca's Ollama instance
2024/10/02 04:13:04 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-10-02T04:13:04.513+02:00 level=INFO source=images.go:753 msg="total blobs: 10"
time=2024-10-02T04:13:04.514+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-10-02T04:13:04.514+02:00 level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11435 (version 0.3.11)"
time=2024-10-02T04:13:04.514+02:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620/runners
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libggml.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libllama.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libggml.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libllama.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libggml.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libllama.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libggml.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libllama.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libggml.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libllama.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libggml.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libllama.so.gz
time=2024-10-02T04:13:04.514+02:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/ollama_llama_server.gz
INFO    [connection_handler.py | start] client version is 0.3.11
INFO    [connection_handler.py | request] GET : http://127.0.0.1:11435/api/tags
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620/runners/cpu/ollama_llama_server
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620/runners/cpu_avx/ollama_llama_server
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620/runners/cpu_avx2/ollama_llama_server
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620/runners/cuda_v11/ollama_llama_server
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620/runners/cuda_v12/ollama_llama_server
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620/runners/rocm_v60102/ollama_llama_server
time=2024-10-02T04:13:11.451+02:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v60102]"
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-10-02T04:13:11.451+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=gpu.go:467 msg="Searching for GPU library" name=libcuda.so*
time=2024-10-02T04:13:11.451+02:00 level=DEBUG source=gpu.go:490 msg="gpu library search" globs="[/app/lib/ollama/libcuda.so* /app/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcuda.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcuda.so* /usr/lib/sdk/llvm15/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcuda.so* /usr/lib/ollama/libcuda.so* /app/plugins/AMD/lib/ollama/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-10-02T04:13:11.453+02:00 level=DEBUG source=gpu.go:524 msg="discovered GPU libraries" paths=[]
time=2024-10-02T04:13:11.453+02:00 level=DEBUG source=gpu.go:467 msg="Searching for GPU library" name=libcudart.so*
time=2024-10-02T04:13:11.453+02:00 level=DEBUG source=gpu.go:490 msg="gpu library search" globs="[/app/lib/ollama/libcudart.so* /app/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcudart.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcudart.so* /usr/lib/sdk/llvm15/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcudart.so* /usr/lib/ollama/libcudart.so* /app/plugins/AMD/lib/ollama/libcudart.so* /app/lib/ollama/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-10-02T04:13:11.454+02:00 level=DEBUG source=gpu.go:524 msg="discovered GPU libraries" paths="[/app/lib/ollama/libcudart.so.12.4.99 /app/lib/ollama/libcudart.so.11.3.109]"
CUDA driver version: 12-6
time=2024-10-02T04:13:11.930+02:00 level=DEBUG source=gpu.go:130 msg="detected GPUs" library=/app/lib/ollama/libcudart.so.12.4.99 count=1
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA totalMem 8216903680
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA freeMem 8067481600
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA usedMem 0
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] Compute Capability 8.9
time=2024-10-02T04:13:11.932+02:00 level=WARN source=amd_linux.go:60 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-10-02T04:13:11.932+02:00 level=DEBUG source=amd_linux.go:103 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-10-02T04:13:11.932+02:00 level=DEBUG source=amd_linux.go:128 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-10-02T04:13:11.932+02:00 level=DEBUG source=amd_linux.go:103 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-10-02T04:13:11.932+02:00 level=DEBUG source=amd_linux.go:218 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5567 unique_id=0
time=2024-10-02T04:13:11.933+02:00 level=DEBUG source=amd_linux.go:231 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-1/device/vendor error="open /sys/class/drm/card1-DP-1/device/vendor: no such file or directory"
time=2024-10-02T04:13:11.933+02:00 level=DEBUG source=amd_linux.go:231 msg="failed to read sysfs node" file=/sys/class/drm/card1-HDMI-A-1/device/vendor error="open /sys/class/drm/card1-HDMI-A-1/device/vendor: no such file or directory"
time=2024-10-02T04:13:11.933+02:00 level=DEBUG source=amd_linux.go:231 msg="failed to read sysfs node" file=/sys/class/drm/card1-eDP-1/device/vendor error="open /sys/class/drm/card1-eDP-1/device/vendor: no such file or directory"
time=2024-10-02T04:13:11.933+02:00 level=DEBUG source=amd_linux.go:252 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card2/device
time=2024-10-02T04:13:11.933+02:00 level=INFO source=amd_linux.go:275 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
time=2024-10-02T04:13:11.933+02:00 level=INFO source=amd_linux.go:361 msg="no compatible amdgpu devices detected"
releasing cudart library
time=2024-10-02T04:13:12.000+02:00 level=INFO source=types.go:107 msg="inference compute" id=GPU-3e851937-aca5-7e73-ffc7-1fb5bfc4fe1a library=cuda variant=v11 compute=8.9 driver=0.0 name="" total="7.7 GiB" available="7.5 GiB"
[GIN] 2024/10/02 - 04:13:12 | 200 |   17.332089ms |       127.0.0.1 | GET      "/api/tags"
INFO    [connection_handler.py | request] POST : http://127.0.0.1:11435/api/show
[GIN] 2024/10/02 - 04:13:12 | 200 |   23.475027ms |       127.0.0.1 | POST     "/api/show"
INFO    [connection_handler.py | request] POST : http://127.0.0.1:11435/api/show
[GIN] 2024/10/02 - 04:13:12 | 200 |   28.940971ms |       127.0.0.1 | POST     "/api/show"
INFO    [connection_handler.py | request] POST : http://127.0.0.1:11435/api/show
[GIN] 2024/10/02 - 04:13:15 | 200 |   18.963542ms |       127.0.0.1 | POST     "/api/show"
INFO    [connection_handler.py | request] POST : http://127.0.0.1:11435/api/show
[GIN] 2024/10/02 - 04:13:16 | 200 |   17.823926ms |       127.0.0.1 | POST     "/api/show"
INFO    [window.py | save_history] Saving history
INFO    [connection_handler.py | request] POST : http://127.0.0.1:11435/api/chat
time=2024-10-02T04:13:29.436+02:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="14.9 GiB" before.free="8.7 GiB" before.free_swap="3.1 GiB" now.total="14.9 GiB" now.free="8.5 GiB" now.free_swap="3.1 GiB"
CUDA driver version: 12-6
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA totalMem 8216903680
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA freeMem 8067481600
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA usedMem 0
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] Compute Capability 8.9
time=2024-10-02T04:13:29.553+02:00 level=DEBUG source=gpu.go:406 msg="updating cuda memory data" gpu=GPU-3e851937-ffff-ffff-ffff-1fb5bfc4fe1a name="" overhead="0 B" before.total="7.7 GiB" before.free="7.5 GiB" now.total="7.7 GiB" now.free="7.5 GiB" now.used="0 B"
releasing cudart library
time=2024-10-02T04:13:29.622+02:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x819f20 gpu_count=1
time=2024-10-02T04:13:29.647+02:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/home/user/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-939fd971f03801a9447af720a78d1fc00833cadf05252b4fc871bfb70eafdda6
time=2024-10-02T04:13:29.647+02:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[7.5 GiB]"
time=2024-10-02T04:13:29.648+02:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[7.5 GiB]"
time=2024-10-02T04:13:29.649+02:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[7.5 GiB]"
time=2024-10-02T04:13:29.650+02:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[7.5 GiB]"
time=2024-10-02T04:13:29.652+02:00 level=INFO source=server.go:103 msg="system memory" total="14.9 GiB" free="8.5 GiB" free_swap="3.1 GiB"
time=2024-10-02T04:13:29.652+02:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[7.5 GiB]"
time=2024-10-02T04:13:29.652+02:00 level=WARN source=server.go:136 msg="model request too large for system" requested="209.7 GiB" available=12541681664 total="14.9 GiB" free="8.5 GiB" swap="3.1 GiB"
time=2024-10-02T04:13:29.652+02:00 level=INFO source=sched.go:428 msg="NewLlamaServer failed" model=/home/user/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-939fd971f03801a9447af720a78d1fc00833cadf05252b4fc871bfb70eafdda6 error="model requires more system memory (209.7 GiB) than is available (11.7 GiB)"
[GIN] 2024/10/02 - 04:13:29 | 500 |  234.574526ms |       127.0.0.1 | POST     "/api/chat"
ERROR   [window.py | run_message] Network Error
ERROR   [window.py | connection_error] Connection error
INFO    [connection_handler.py | reset] Resetting Alpaca's Ollama instance
INFO    [connection_handler.py | stop] Stopping Alpaca's Ollama instance
time=2024-10-02T04:13:29.655+02:00 level=DEBUG source=sched.go:119 msg="shutting down scheduler pending loop"
time=2024-10-02T04:13:29.655+02:00 level=DEBUG source=common.go:73 msg="cleaning up" dir=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama2756547620
time=2024-10-02T04:13:29.655+02:00 level=DEBUG source=sched.go:318 msg="shutting down scheduler completed loop"
INFO    [connection_handler.py | stop] Stopped Alpaca's Ollama instance
INFO    [connection_handler.py | start] Starting Alpaca's Ollama instance...
INFO    [connection_handler.py | start] Started Alpaca's Ollama instance
2024/10/02 04:13:31 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/user/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2024-10-02T04:13:31.160+02:00 level=INFO source=images.go:753 msg="total blobs: 10"
time=2024-10-02T04:13:31.160+02:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-10-02T04:13:31.160+02:00 level=INFO source=routes.go:1200 msg="Listening on 127.0.0.1:11435 (version 0.3.11)"
INFO    [connection_handler.py | start] client version is 0.3.11
INFO    [window.py | show_toast] There was an error with the local Ollama instance, so it has been reset
time=2024-10-02T04:13:31.160+02:00 level=INFO source=common.go:135 msg="extracting embedded files" dir=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama3472458768/runners
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libggml.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libllama.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libggml.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libllama.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libggml.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libllama.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libggml.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libllama.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libggml.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libllama.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libggml.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libllama.so.gz
time=2024-10-02T04:13:31.161+02:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/ollama_llama_server.gz
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama3472458768/runners/cpu/ollama_llama_server
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama3472458768/runners/cpu_avx/ollama_llama_server
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama3472458768/runners/cpu_avx2/ollama_llama_server
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama3472458768/runners/cuda_v11/ollama_llama_server
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama3472458768/runners/cuda_v12/ollama_llama_server
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/home/user/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama3472458768/runners/rocm_v60102/ollama_llama_server
time=2024-10-02T04:13:37.918+02:00 level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cuda_v12 rocm_v60102 cpu cpu_avx cpu_avx2 cuda_v11]"
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=common.go:50 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-10-02T04:13:37.918+02:00 level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=gpu.go:467 msg="Searching for GPU library" name=libcuda.so*
time=2024-10-02T04:13:37.918+02:00 level=DEBUG source=gpu.go:490 msg="gpu library search" globs="[/app/lib/ollama/libcuda.so* /app/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcuda.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcuda.so* /usr/lib/sdk/llvm15/lib/libcuda.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcuda.so* /usr/lib/ollama/libcuda.so* /app/plugins/AMD/lib/ollama/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-10-02T04:13:37.919+02:00 level=DEBUG source=gpu.go:524 msg="discovered GPU libraries" paths=[]
time=2024-10-02T04:13:37.919+02:00 level=DEBUG source=gpu.go:467 msg="Searching for GPU library" name=libcudart.so*
time=2024-10-02T04:13:37.919+02:00 level=DEBUG source=gpu.go:490 msg="gpu library search" globs="[/app/lib/ollama/libcudart.so* /app/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcudart.so* /usr/lib/x86_64-linux-gnu/openh264/extra/libcudart.so* /usr/lib/sdk/llvm15/lib/libcudart.so* /usr/lib/x86_64-linux-gnu/GL/default/lib/libcudart.so* /usr/lib/ollama/libcudart.so* /app/plugins/AMD/lib/ollama/libcudart.so* /app/lib/ollama/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-10-02T04:13:37.920+02:00 level=DEBUG source=gpu.go:524 msg="discovered GPU libraries" paths="[/app/lib/ollama/libcudart.so.12.4.99 /app/lib/ollama/libcudart.so.11.3.109]"
CUDA driver version: 12-6
time=2024-10-02T04:13:38.379+02:00 level=DEBUG source=gpu.go:130 msg="detected GPUs" library=/app/lib/ollama/libcudart.so.12.4.99 count=1
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA totalMem 8216903680
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA freeMem 8067481600
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] CUDA usedMem 0
[GPU-3e851937-aca5-ffff-ffff-1fb5bfc4fe1a] Compute Capability 8.9
time=2024-10-02T04:13:38.380+02:00 level=WARN source=amd_linux.go:60 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-10-02T04:13:38.380+02:00 level=DEBUG source=amd_linux.go:103 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-10-02T04:13:38.380+02:00 level=DEBUG source=amd_linux.go:128 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-10-02T04:13:38.380+02:00 level=DEBUG source=amd_linux.go:103 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-10-02T04:13:38.380+02:00 level=DEBUG source=amd_linux.go:218 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5567 unique_id=0
time=2024-10-02T04:13:38.381+02:00 level=DEBUG source=amd_linux.go:231 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-1/device/vendor error="open /sys/class/drm/card1-DP-1/device/vendor: no such file or directory"
time=2024-10-02T04:13:38.381+02:00 level=DEBUG source=amd_linux.go:231 msg="failed to read sysfs node" file=/sys/class/drm/card1-HDMI-A-1/device/vendor error="open /sys/class/drm/card1-HDMI-A-1/device/vendor: no such file or directory"
time=2024-10-02T04:13:38.381+02:00 level=DEBUG source=amd_linux.go:231 msg="failed to read sysfs node" file=/sys/class/drm/card1-eDP-1/device/vendor error="open /sys/class/drm/card1-eDP-1/device/vendor: no such file or directory"
time=2024-10-02T04:13:38.381+02:00 level=DEBUG source=amd_linux.go:252 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card2/device
time=2024-10-02T04:13:38.381+02:00 level=INFO source=amd_linux.go:275 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
time=2024-10-02T04:13:38.381+02:00 level=INFO source=amd_linux.go:361 msg="no compatible amdgpu devices detected"
releasing cudart library
time=2024-10-02T04:13:38.444+02:00 level=INFO source=types.go:107 msg="inference compute" id=GPU-3e851937-aca5-7e73-ffc7-1fb5bfc4fe1a library=cuda variant=v11 compute=8.9 driver=0.0 name="" total="7.7 GiB" available="7.5 GiB"
Jeffser commented 1 month ago

Hi, I added a toast message when the model is too big to be loaded

https://github.com/Jeffser/Alpaca/commit/115e22e52c2ff976752591a0568137881f4ed63d