Open Vlod-github opened 3 weeks ago
tabby_x86_64-windows-msvc-vulkan/tabby serve
fails to ever finish loading for me too. The first problem it reports is also "not compiled with GPU offload support". It seems like Vulkan should support offloading to GPU.
Information about your version tabby 0.19.0
Information about your GPU Intel Arc A770M 16GB VRAM
'nvidia-smi' is not recognized as an internal or external command,
operable program or batch file.
Intel Driver & Support Assistant reports:
Driver Details
Up to date
ProviderIntel Corporation
Version32.0.101.6078
Date2024-09-13
Device Details
Adapter CompatibilityIntel Corporation
Video ProcessorIntel® Arc™ A770M Graphics Family
Resolution3840 x 2160
Bits Per Pixel32
Number of Colors4294967296
Refresh Rate - Current60 Hz
Refresh Rate - Maximum240 Hz
Refresh Rate - Minimum23 Hz
Adapter DAC TypeInternal
AvailabilityRunning at full power
StatusThis device is working properly.
LocationPCI bus 3, device 0, function 0
Device IdPCI\VEN_8086&DEV_5690&SUBSYS_30268086&REV_08\6&1A6B8599&0&00080008
Additional context
C:\bin\offpath\tabby_x86_64-windows-msvc-vulkan>tabby serve --model StarCoder-1B --device vulkan
⠦ 4.527 s Starting...2024-11-06T09:01:10.861861Z WARN llama_cpp_server::supervisor: crates\llama-cpp-server\src\supervisor.rs:98: llama-server <embedding> exited with status code -1073741819, args: `Command { std: "C:\\bin\\offpath\\tabby_x86_64-windows-msvc-vulkan\\llama-server.exe" "-m" "C:\\Users\\s_pam\\.tabby\\models\\TabbyML\\Nomic-Embed-Text\\ggml\\model-00001-of-00001.gguf" "--cont-batching" "--port" "30888" "-np" "1" "--log-disable" "--ctx-size" "4096" "-ngl" "9999" "--embedding" "--ubatch-size" "4096", kill_on_drop: true }`
2024-11-06T09:01:10.862447Z WARN llama_cpp_server::supervisor: crates\llama-cpp-server\src\supervisor.rs:110: <embedding>: warning: not compiled with GPU offload support, --gpu-layers option will be ignored
2024-11-06T09:01:10.862792Z WARN llama_cpp_server::supervisor: crates\llama-cpp-server\src\supervisor.rs:110: <embedding>: warning: see main README.md for information on enabling GPU BLAS support
2024-11-06T09:01:10.863095Z WARN llama_cpp_server::supervisor: crates\llama-cpp-server\src\supervisor.rs:110: <embedding>: llama_model_loader: loaded meta data with 23 key-value pairs and 112 tensors from C:\Users\s_pam\.tabby\models\TabbyML\Nomic-Embed-Text\ggml\model-00001-of-00001.gguf (version GGUF V3 (latest))
2024-11-06T09:01:10.863227Z WARN llama_cpp_server::supervisor: crates\llama-cpp-server\src\supervisor.rs:110: <embedding>: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
2024-11-06T09:01:10.863359Z WARN llama_cpp_server::supervisor: crates\llama-cpp-server\src\supervisor.rs:110: <embedding>: llama_model_loader: - kv 0: general.architecture str = nomic-bert
A workaround is to install an upstream llama-server.exe, e.g. llama-b4034-bin-win-vulkan-x64
Hi @gmatht and @Vlod-github, thank you for reporting the issue.
Did you install the Vulkan runtime before using Tabby? Could you please try running the following command and post the result here:
vulkaninfo.exe
I have verified this on Linux and found an issue with the Vulkan build. We will investigate further and fix it later.
https://gist.github.com/zwpaper/08e80712e1f3f82a41a1a0ee41735b2f
After running
.\tabby.exe serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct
after downloading the models, the llama-server.exe crashes.Tabby tabby v0.18.0 and tabby v0.19.0-rc.1 tabby_x86_64-windows-msvc and tabby_x86_64-windows-msvc-vulkan It turns out that this doesn't work even for the CPU.
Environment Ryzen 5 3500U with Vega 8 Windows 10
Further, I connected these gguf models to gpt4all and they work, so the issue is with the backend. Here is the output that tabby periodically produces.