ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.89k stars 9.46k forks source link

Support more AMD GPUs like `gfx90c` #6110

Closed James4Ever0 closed 2 months ago

James4Ever0 commented 6 months ago

ollama is using llama.cpp under the hood.

Can trick ollama to use GPU but loading model taking forever.

Procedures:

Logs:

time=2024-03-10T22:51:10.851+08:00 level=INFO source=images.go:806 msg="total blobs: 34"
time=2024-03-10T22:51:10.852+08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-10T22:51:10.852+08:00 level=INFO source=routes.go:1082 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-10T22:51:10.852+08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama1384499486 ..."
time=2024-03-10T22:51:13.122+08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cuda_v11 cpu_avx cpu cpu_avx2]"
time=2024-03-10T22:51:13.122+08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-10T22:51:13.122+08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-10T22:51:13.124+08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-10T22:51:13.124+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-10T22:51:13.124+08:00 level=INFO source=amd_linux.go:47 msg="AMD Driver: 6.2.4"
time=2024-03-10T22:51:13.124+08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx9012]"
time=2024-03-10T22:51:13.125+08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 8589934592"
time=2024-03-10T22:51:13.125+08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  7956676608"
[GIN] 2024/03/10 - 22:51:24 | 200 |      30.317µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/03/10 - 22:51:24 | 200 |     657.471µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/03/10 - 22:51:24 | 200 |       186.7µs |       127.0.0.1 | POST     "/api/show"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=amd_linux.go:47 msg="AMD Driver: 6.2.4"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx9012]"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 8589934592"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  7954382848"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=amd_linux.go:47 msg="AMD Driver: 6.2.4"
time=2024-03-10T22:51:24.548+08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx9012]"
time=2024-03-10T22:51:24.549+08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 8589934592"
time=2024-03-10T22:51:24.549+08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  7954382848"
time=2024-03-10T22:51:24.549+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama1384499486/rocm_v60000/libext_server.so
time=2024-03-10T22:51:24.581+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1384499486/rocm_v60000/libext_server.so"
time=2024-03-10T22:51:24.581+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /root/.ollama/models/blobs/sha256:04778965089b91318ad61d0995b7e44fad4b9a9f4e049d7be90932bf8812e828 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi2
llama_model_loader: - kv   1:                               general.name str              = Phi2
llama_model_loader: - kv   2:                        phi2.context_length u32              = 2048
llama_model_loader: - kv   3:                      phi2.embedding_length u32              = 2560
llama_model_loader: - kv   4:                   phi2.feed_forward_length u32              = 10240
llama_model_loader: - kv   5:                           phi2.block_count u32              = 32
llama_model_loader: - kv   6:                  phi2.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi2.attention.head_count_kv u32              = 32
llama_model_loader: - kv   8:          phi2.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                  phi2.rope.dimension_count u32              = 32
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,51200]   = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,51200]   = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,50000]   = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 50256
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 50256
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 50256
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  195 tensors
llama_model_loader: - type q4_0:  129 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = phi2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 51200
llm_load_print_meta: n_merges         = 50000
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2560
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 32
llm_load_print_meta: n_embd_head_k    = 80
llm_load_print_meta: n_embd_head_v    = 80
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2560
llm_load_print_meta: n_embd_v_gqa     = 2560
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 10240
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 2.78 B
llm_load_print_meta: model size       = 1.49 GiB (4.61 BPW) 
llm_load_print_meta: general.name     = Phi2
llm_load_print_meta: BOS token        = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token        = 50256 '<|endoftext|>'
llm_load_print_meta: UNK token        = 50256 '<|endoftext|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.25 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  1456.19 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB
hiepxanh commented 6 months ago

@James4Ever0 Ollama just a wrapper for llama.cpp, llama.cpp just run AI model using CUDA or ROCm bridge. Currently ROCm have support but have some issue with gfx90c You should open issue on AMD repo. I think someone open here https://github.com/ROCm/ROCm/issues/2774

James4Ever0 commented 6 months ago

I found two solutions from the sources. Will try later and give feedback.

One is export HSA_ENABLE_SDMA=0, the other is disabling power features.

sudo vim /etc/default/grub
# add "amdgpu.ppfeaturemask=0xffff3fff amdgpu.runpm=0x0" into GRUB_CMDLINE_LINUX_DEFAULT
sudo update-grub
reboot
cat /proc/cmdline
# see if the modification takes effect
hiepxanh commented 6 months ago

oh that great, I hope you can make a path for other to follow

James4Ever0 commented 6 months ago

It works but in order to have system RAM as VRAM, not limited by the BIOS VRAM setting, further investigation is needed.

When compile force-host-alloction-APU with the script below and run ollama hooked with the shared library, I get segmentation fault.

The script (running under root):

CUDA_PATH=/usr/ HIP_PLATFORM="amd" /opt/rocm/bin/amdclang forcegttalloc.c -o libforcegttalloc.so  -shared -fPIC
env HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_PRELOAD=./libforcegttalloc.so ollama serve

Full log:

time=2024-03-19T23:13:34.498+08:00 level=INFO source=images.go:806 msg="total blobs: 34"
time=2024-03-19T23:13:34.498+08:00 level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-19T23:13:34.499+08:00 level=INFO source=routes.go:1082 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-19T23:13:34.499+08:00 level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama3447979137 ..."
time=2024-03-19T23:13:36.767+08:00 level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cuda_v11 cpu_avx2 cpu_avx cpu rocm_v60000]"
time=2024-03-19T23:13:36.767+08:00 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-19T23:13:36.767+08:00 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-19T23:13:36.769+08:00 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-19T23:13:36.769+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-19T23:13:36.769+08:00 level=INFO source=amd_linux.go:47 msg="AMD Driver: 6.2.4"
time=2024-03-19T23:13:36.769+08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx9012]"
time=2024-03-19T23:13:36.769+08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 8589934592"
time=2024-03-19T23:13:36.769+08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  8074760192"
[GIN] 2024/03/19 - 23:13:42 | 200 |      37.871µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/03/19 - 23:13:42 | 200 |     374.398µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/03/19 - 23:13:42 | 200 |     209.141µs |       127.0.0.1 | POST     "/api/show"
time=2024-03-19T23:13:42.447+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-19T23:13:42.447+08:00 level=INFO source=amd_linux.go:47 msg="AMD Driver: 6.2.4"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx9012]"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 8589934592"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  8072663040"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=amd_linux.go:47 msg="AMD Driver: 6.2.4"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=amd_linux.go:85 msg="detected amdgpu versions [gfx9012]"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=amd_linux.go:235 msg="[1] amdgpu totalMemory 8589934592"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=amd_linux.go:236 msg="[1] amdgpu freeMemory  8072663040"
time=2024-03-19T23:13:42.448+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
loading library /tmp/ollama3447979137/rocm_v60000/libext_server.so
time=2024-03-19T23:13:42.477+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama3447979137/rocm_v60000/libext_server.so"
time=2024-03-19T23:13:42.477+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
SIGSEGV: segmentation violation
PC=0x0 m=3 sigcode=1 addr=0x0
signal arrived during cgo execution

goroutine 39 gp=0xc000501340 m=3 mp=0xc00007d008 [syscall]:
runtime.cgocall(0xebabd0, 0xc0000486f8)
    /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0000486d0 sp=0xc000048698 pc=0x40a72b
github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7fe35c0014b0, 0x7fe31267c8f0, 0x7fe31267d110, 0x7fe31267d1a0, 0x7fe31267d420, 0x7fe31267d660, 0x7fe31267dfc0, 0x7fe31267dfa0, 0x7fe31267e0d0, 0x7fe31267e6e0, ...}, ...)
    _cgo_gotypes.go:286 +0x45 fp=0xc0000486f8 sp=0xc0000486d0 pc=0xce58e5
github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xc0003a8000, 0xc000ed4030)
    /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:154 +0x112 fp=0xc000048838 sp=0xc0000486f8 pc=0xce6f92
github.com/jmorganca/ollama/llm.newDynExtServer({0xc000132a40, 0x32}, {0xc0004b4230, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
    /go/src/github.com/jmorganca/ollama/llm/dyn_ext_server.go:154 +0xb50 fp=0xc000048a80 sp=0xc000048838 pc=0xce6bd0
github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0x0, ...}, ...)
    /go/src/github.com/jmorganca/ollama/llm/llm.go:166 +0x4c5 fp=0xc000048c40 sp=0xc000048a80 pc=0xce3165
github.com/jmorganca/ollama/llm.New({0xc0004b4230, 0x62}, {0x0, 0x0, 0x0}, {0x0, _, _}, {{0x0, 0x800, ...}, ...})
    /go/src/github.com/jmorganca/ollama/llm/llm.go:131 +0x90e fp=0xc000048ed8 sp=0xc000048c40 pc=0xce2b0e
github.com/jmorganca/ollama/server.load(0xc0004bc000?, 0xc0004bc000, {{0x0, 0x800, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...)
    /go/src/github.com/jmorganca/ollama/server/routes.go:84 +0x325 fp=0xc000049028 sp=0xc000048ed8 pc=0xe93665
github.com/jmorganca/ollama/server.ChatHandler(0xc000161500)
    /go/src/github.com/jmorganca/ollama/server/routes.go:1236 +0xa37 fp=0xc000049730 sp=0xc000049028 pc=0xe9f277
github.com/gin-gonic/gin.(*Context).Next(0xc000161500)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174 +0x2b fp=0xc000049750 sp=0xc000049730 pc=0xe66eeb
github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.allowedHostsMiddleware.func3(0xc000161500)
    /go/src/github.com/jmorganca/ollama/server/routes.go:973 +0x115 fp=0xc0000497a8 sp=0xc000049750 pc=0xe9d9f5
github.com/gin-gonic/gin.(*Context).Next(...)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc000161500)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc0000497f8 sp=0xc0000497a8 pc=0xe73dda
github.com/gin-gonic/gin.(*Context).Next(...)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc000161500)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xdd fp=0xc0000499a8 sp=0xc0000497f8 pc=0xe72f1d
github.com/gin-gonic/gin.(*Context).Next(...)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc0000f61a0, 0xc000161500)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x66e fp=0xc000049b28 sp=0xc0000499a8 pc=0xe7240e
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc0000f61a0, {0x11652830, 0xc00010ea80}, 0xc0000f8240)
    /root/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1b2 fp=0xc000049b60 sp=0xc000049b28 pc=0xe71bd2
net/http.serverHandler.ServeHTTP({0x11650710?}, {0x11652830?, 0xc00010ea80?}, 0x6?)
    /usr/local/go/src/net/http/server.go:3137 +0x8e fp=0xc000049b90 sp=0xc000049b60 pc=0x6fef4e
net/http.(*conn).serve(0xc0001301b0, {0x11654be8, 0xc00050ec60})
    /usr/local/go/src/net/http/server.go:2039 +0x5e8 fp=0xc000049fb8 sp=0xc000049b90 pc=0x6fa308
net/http.(*Server).Serve.gowrap3()
    /usr/local/go/src/net/http/server.go:3285 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x6ff768
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x4742e1
created by net/http.(*Server).Serve in goroutine 1
    /usr/local/go/src/net/http/server.go:3285 +0x4b4

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0xc000054a08?, 0x0?, 0xc0?, 0x61?, 0xc000885868?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000885830 sp=0xc000885810 pc=0x44160e
runtime.netpollblock(0xc0008858c8?, 0x409ec6?, 0x0?)
    /usr/local/go/src/runtime/netpoll.go:573 +0xf7 fp=0xc000885868 sp=0xc000885830 pc=0x43a377
internal/poll.runtime_pollWait(0x7fe3baca66d0, 0x72)
    /usr/local/go/src/runtime/netpoll.go:345 +0x85 fp=0xc000885888 sp=0xc000885868 pc=0x46e9e5
internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0008858b0 sp=0xc000885888 pc=0x5030a7
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000456580)
    /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc000885958 sp=0xc0008858b0 pc=0x50844c
net.(*netFD).accept(0xc000456580)
    /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc000885a10 sp=0xc000885958 pc=0x597c49
net.(*TCPListener).accept(0xc00042a240)
    /usr/local/go/src/net/tcpsock_posix.go:159 +0x1e fp=0xc000885a38 sp=0xc000885a10 pc=0x5adb7e
net.(*TCPListener).Accept(0xc00042a240)
    /usr/local/go/src/net/tcpsock.go:327 +0x30 fp=0xc000885a68 sp=0xc000885a38 pc=0x5acd70
net/http.(*onceCloseListener).Accept(0xc0001301b0?)
    <autogenerated>:1 +0x24 fp=0xc000885a80 sp=0xc000885a68 pc=0x7219a4
net/http.(*Server).Serve(0xc000366000, {0x116525c0, 0xc00042a240})
    /usr/local/go/src/net/http/server.go:3255 +0x33e fp=0xc000885bb0 sp=0xc000885a80 pc=0x6ff37e
github.com/jmorganca/ollama/server.Serve({0x116525c0, 0xc00042a240})
    /go/src/github.com/jmorganca/ollama/server/routes.go:1109 +0x4bf fp=0xc000885cc0 sp=0xc000885bb0 pc=0xe9df3f
github.com/jmorganca/ollama/cmd.RunServer(0xc000160b00?, {0x11da7400?, 0x4?, 0x104fbd5?})
    /go/src/github.com/jmorganca/ollama/cmd/cmd.go:787 +0x1b9 fp=0xc000885d58 sp=0xc000885cc0 pc=0xeb1b39
github.com/spf13/cobra.(*Command).execute(0xc000455508, {0x11da7400, 0x0, 0x0})
    /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x882 fp=0xc000885e78 sp=0xc000885d58 pc=0x794922
github.com/spf13/cobra.(*Command).ExecuteC(0xc000454908)
    /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000885f30 sp=0xc000885e78 pc=0x795165
github.com/spf13/cobra.(*Command).Execute(...)
    /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    /root/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
    /go/src/github.com/jmorganca/ollama/main.go:11 +0x4d fp=0xc000885f50 sp=0xc000885f30 pc=0xeb9ced
runtime.main()
    /usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc000885fe0 sp=0xc000885f50 pc=0x4411dd
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000885fe8 sp=0xc000885fe0 pc=0x4742e1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x44160e
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:408
runtime.forcegchelper()
    /usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x441493
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x4742e1
created by runtime.init.6 in goroutine 1
    /usr/local/go/src/runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x44160e
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:408
runtime.bgsweep(0xc00007e000)
    /usr/local/go/src/runtime/mgcsweep.go:318 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x42cbbf
runtime.gcenable.gowrap1()
    /usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x4214a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x4742e1
created by runtime.gcenable in goroutine 1
    /usr/local/go/src/runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0x421baa?, 0x7f5478?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x44160e
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:408
runtime.(*scavengerState).park(0x11d413c0)
    /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x42a549
runtime.bgscavenge(0xc00007e000)
    /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x42aaf9
runtime.gcenable.gowrap2()
    /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x421445
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x4742e1
created by runtime.gcenable in goroutine 1
    /usr/local/go/src/runtime/mgc.go:204 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000084648?, 0x414865?, 0xa8?, 0x1?, 0xc0000061c0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084620 sp=0xc000084600 pc=0x44160e
runtime.runfinq()
    /usr/local/go/src/runtime/mfinal.go:194 +0x107 fp=0xc0000847e0 sp=0xc000084620 pc=0x4204e7
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x4742e1
created by runtime.createfing in goroutine 1
    /usr/local/go/src/runtime/mfinal.go:164 +0x3d

goroutine 6 gp=0xc0001136c0 m=nil [GC worker (idle)]:
runtime.gopark(0x11da93c0?, 0x1?, 0x82?, 0x9d?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000086750 sp=0xc000086730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000867e0 sp=0xc000086750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 18 gp=0xc000480000 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab88911?, 0x3?, 0x2f?, 0xf8?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080750 sp=0xc000080730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000807e0 sp=0xc000080750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 7 gp=0xc000113880 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab997ec?, 0x3?, 0x8b?, 0x16?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000086f50 sp=0xc000086f30 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000086fe0 sp=0xc000086f50 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 19 gp=0xc000480540 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab99ebc?, 0x3?, 0xc3?, 0xa?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080f50 sp=0xc000080f30 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000080fe0 sp=0xc000080f50 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 34 gp=0xc000500000 m=nil [GC worker (idle)]:
runtime.gopark(0x951fac286b1?, 0x1?, 0x16?, 0x96?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000506750 sp=0xc000506730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005067e0 sp=0xc000506750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005067e8 sp=0xc0005067e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 20 gp=0xc000480700 m=nil [GC worker (idle)]:
runtime.gopark(0x11da93c0?, 0x1?, 0x27?, 0xb9?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000081750 sp=0xc000081730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000817e0 sp=0xc000081750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 21 gp=0xc0004808c0 m=nil [GC worker (idle)]:
runtime.gopark(0x951faba6762?, 0x3?, 0xd7?, 0x1?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000081f50 sp=0xc000081f30 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000081fe0 sp=0xc000081f50 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 8 gp=0xc000113a40 m=nil [GC worker (idle)]:
runtime.gopark(0x11da93c0?, 0x3?, 0x3f?, 0xf5?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000087750 sp=0xc000087730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000877e0 sp=0xc000087750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 9 gp=0xc000113c00 m=nil [GC worker (idle)]:
runtime.gopark(0x951faba585a?, 0x1?, 0x50?, 0xa6?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000087f50 sp=0xc000087f30 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000087fe0 sp=0xc000087f50 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 10 gp=0xc000113dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x11da93c0?, 0x1?, 0x3f?, 0x29?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000502750 sp=0xc000502730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005027e0 sp=0xc000502750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005027e8 sp=0xc0005027e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 11 gp=0xc000440000 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab99c4e?, 0x1?, 0xda?, 0x21?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000502f50 sp=0xc000502f30 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000502fe0 sp=0xc000502f50 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000502fe8 sp=0xc000502fe0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 35 gp=0xc0005001c0 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab997d8?, 0x3?, 0x18?, 0x6c?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000506f50 sp=0xc000506f30 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000506fe0 sp=0xc000506f50 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000506fe8 sp=0xc000506fe0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 36 gp=0xc000500380 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab99bcc?, 0x1?, 0x9b?, 0x44?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000507750 sp=0xc000507730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005077e0 sp=0xc000507750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005077e8 sp=0xc0005077e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 37 gp=0xc000500540 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab99c58?, 0x3?, 0x24?, 0x25?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000507f50 sp=0xc000507f30 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000507fe0 sp=0xc000507f50 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000507fe8 sp=0xc000507fe0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 38 gp=0xc000500700 m=nil [GC worker (idle)]:
runtime.gopark(0x951fab99bf4?, 0x3?, 0xe6?, 0x7d?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000508750 sp=0xc000508730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005087e0 sp=0xc000508750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005087e8 sp=0xc0005087e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 22 gp=0xc000480a80 m=nil [GC worker (idle)]:
runtime.gopark(0x11da93c0?, 0x1?, 0x58?, 0x3e?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000082750 sp=0xc000082730 pc=0x44160e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000827e0 sp=0xc000082750 pc=0x423585
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x4742e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 23 gp=0xc000440540 m=nil [select, locked to thread]:
runtime.gopark(0xc0005057a8?, 0x2?, 0xa9?, 0x18?, 0xc000505794?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000505638 sp=0xc000505618 pc=0x44160e
runtime.selectgo(0xc0005057a8, 0xc000505790, 0x0?, 0x0, 0x0?, 0x1)
    /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000505758 sp=0xc000505638 pc=0x452a65
runtime.ensureSigM.func1()
    /usr/local/go/src/runtime/signal_unix.go:1034 +0x19f fp=0xc0005057e0 sp=0xc000505758 pc=0x46b71f
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005057e8 sp=0xc0005057e0 pc=0x4742e1
created by runtime.ensureSigM in goroutine 1
    /usr/local/go/src/runtime/signal_unix.go:1017 +0xc8

goroutine 12 gp=0xc0005008c0 m=4 mp=0xc00007d808 [syscall]:
runtime.notetsleepg(0x11da8080, 0xffffffffffffffff)
    /usr/local/go/src/runtime/lock_futex.go:246 +0x29 fp=0xc0004687a0 sp=0xc000468778 pc=0x412e89
os/signal.signal_recv()
    /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0004687c0 sp=0xc0004687a0 pc=0x470d49
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0004687e0 sp=0xc0004687c0 pc=0x723d53
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004687e8 sp=0xc0004687e0 pc=0x4742e1
created by os/signal.Notify.func1.1 in goroutine 1
    /usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 13 gp=0xc000500a80 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000468f20 sp=0xc000468f00 pc=0x44160e
runtime.chanrecv(0xc0000aa480, 0x0, 0x1)
    /usr/local/go/src/runtime/chan.go:583 +0x3bf fp=0xc000468f98 sp=0xc000468f20 pc=0x40cd3f
runtime.chanrecv1(0x0?, 0x0?)
    /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc000468fc0 sp=0xc000468f98 pc=0x40c952
github.com/jmorganca/ollama/server.Serve.func2()
    /go/src/github.com/jmorganca/ollama/server/routes.go:1091 +0x19 fp=0xc000468fe0 sp=0xc000468fc0 pc=0xe9dfb9
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000468fe8 sp=0xc000468fe0 pc=0x4742e1
created by github.com/jmorganca/ollama/server.Serve in goroutine 1
    /go/src/github.com/jmorganca/ollama/server/routes.go:1090 +0x40e

goroutine 41 gp=0xc000440700 m=nil [IO wait]:
runtime.gopark(0x104ec58?, 0x1?, 0x68?, 0x4f?, 0xb?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000504da8 sp=0xc000504d88 pc=0x44160e
runtime.netpollblock(0x4863f8?, 0x409ec6?, 0x0?)
    /usr/local/go/src/runtime/netpoll.go:573 +0xf7 fp=0xc000504de0 sp=0xc000504da8 pc=0x43a377
internal/poll.runtime_pollWait(0x7fe3baca65d8, 0x72)
    /usr/local/go/src/runtime/netpoll.go:345 +0x85 fp=0xc000504e00 sp=0xc000504de0 pc=0x46e9e5
internal/poll.(*pollDesc).wait(0xc000456880?, 0xc00050ee81?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000504e28 sp=0xc000504e00 pc=0x5030a7
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000456880, {0xc00050ee81, 0x1, 0x1})
    /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000504ec0 sp=0xc000504e28 pc=0x50439a
net.(*netFD).Read(0xc000456880, {0xc00050ee81?, 0x8?, 0x0?})
    /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000504f08 sp=0xc000504ec0 pc=0x595c65
net.(*conn).Read(0xc00007a698, {0xc00050ee81?, 0x0?, 0xc000504fd0?})
    /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000504f50 sp=0xc000504f08 pc=0x5a4ac5
net.(*TCPConn).Read(0x0?, {0xc00050ee81?, 0xc0006861c0?, 0x7ebe40?})
    <autogenerated>:1 +0x25 fp=0xc000504f80 sp=0xc000504f50 pc=0x5b6145
net/http.(*connReader).backgroundRead(0xc00050ee70)
    /usr/local/go/src/net/http/server.go:681 +0x37 fp=0xc000504fc8 sp=0xc000504f80 pc=0x6f4277
net/http.(*connReader).startBackgroundRead.gowrap2()
    /usr/local/go/src/net/http/server.go:677 +0x25 fp=0xc000504fe0 sp=0xc000504fc8 pc=0x6f41a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000504fe8 sp=0xc000504fe0 pc=0x4742e1
created by net/http.(*connReader).startBackgroundRead in goroutine 39
    /usr/local/go/src/net/http/server.go:677 +0xba

rax    0x0
rbx    0x7fe1091f58d0
rcx    0x1
rdx    0x1
rdi    0x7fe1091f5918
rsi    0x2000000
rbp    0x7fe3738fcb20
rsp    0x7fe3738fcaf8
r8     0x7fe3738fc8e0
r9     0x7fe3738fca60
r10    0x7fe1091f5df0
r11    0x7fe35c000090
r12    0x7fe3738fcb68
r13    0x7fe3738fcb40
r14    0x7fe1091f5918
r15    0x7fe3738fcb68
rip    0x0
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0
MrDoe commented 5 months ago

I found two solutions from the sources. Will try later and give feedback.

One is export HSA_ENABLE_SDMA=0, the other is disabling power features.

sudo vim /etc/default/grub
# add "amdgpu.ppfeaturemask=0xffff3fff amdgpu.runpm=0x0" into GRUB_CMDLINE_LINUX_DEFAULT
sudo update-grub
reboot
cat /proc/cmdline
# see if the modification takes effect

Thanks for your instructions! It works on my gfx90c, as long as the models fit into memory (I reserved 8/16GB for the GPU). In case that your notebook also doesn't have a BIOS setting for changing the allocated GPU memory (like my Asus Vivobook), you can follow this guide to set it via a special bootable flash drive: https://winstonhyypia.medium.com/amd-apu-how-to-modify-the-dedicated-gpu-memory-e27b75905056

Even though it's working now, I can't see any benefits from running llama.cpp with GPU support. I expected that it would profit more from GPU offloading.

MaciejMogilany commented 4 months ago

Some improvements come to the 6.10 linux kernel. UMA memory can be used without hacks on APU. https://github.com/ROCm/ROCm/issues/2014#issuecomment-2131988809

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.