starting the server by hand and choose one of the vulkan GPUs works well – actually very well! kudos 4 u!
in auto mode, it tries to load the model to the first reported (integrated) GPU (here: haswell/intel celeron) .
as it looks like, from watching radeontop, it then starts to offload to GPU, which breaks the system. see kernel log below.
further down the road RWKV-Runner acts the same. it's a bit more friendly in stopping to load the model earlier an stays responsive, but also cannot load a 7B model to a GPU.
suggestion:
implement a solution so that one can load the model to a discrete GPU i.e.
-a \:\
-a vulkan:01
or skip to load to GPU0, if it is integrated and if there are discrete GPUs available.
Loadorder:
./ai00_server --port 8082 --quant 32 --model assets/models/RWKV-4-World-ARAtuned-7B-v1-20230803-ctx4096.stMESA-INTEL: warning: Haswell Vulkan support is incomplete
? Please select an adapter ›
❯ Intel(R) HD Graphics (HSW GT1) (Vulkan)
Radeon RX 580 Series (Vulkan)
Radeon RX 580 Series (Vulkan)
Radeon RX 580 Series (Vulkan)
Radeon RX 580 Series (Vulkan)
starting the server by hand and choose one of the vulkan GPUs works well – actually very well! kudos 4 u!
in auto mode, it tries to load the model to the first reported (integrated) GPU (here: haswell/intel celeron) . as it looks like, from watching radeontop, it then starts to offload to GPU, which breaks the system. see kernel log below.
further down the road RWKV-Runner acts the same. it's a bit more friendly in stopping to load the model earlier an stays responsive, but also cannot load a 7B model to a GPU.
suggestion: implement a solution so that one can load the model to a discrete GPU i.e. -a \:\
-a vulkan:01
or skip to load to GPU0, if it is integrated and if there are discrete GPUs available.
Loadorder:
vulkaninfo:
system: is an old crypto RIG with celeron, 8GB & 4 x RX580 (8GB) OS: debian 12 kernel: Linux jeeves 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux vulkan driver: amdgpu (opensource)
versions:
kernellog: