ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.96k stars 9.32k forks source link

Bug: RPC server doesn't load GPU if I use Vulkan #8536

Open metal3d opened 2 months ago

metal3d commented 2 months ago

What happened?

I compiled llamacpp with Vulkan backend. The "rpc-server" binary is linked to libvulkan but it never uses my GPUs. While "llama-cli" is OK.

Name and Version

version: 3384 (4e24cffd) built with cc (GCC) 14.1.1 20240701 (Red Hat 14.1.1-7) for x86_64-redhat-linux

What operating system are you seeing the problem on?

Linux

Relevant log output

./rpc-server
create_backend: using CPU backend
Starting RPC server on 0.0.0.0:50052, backend memory: 23967 MB

ldd ./rpc-server
        linux-vdso.so.1 (0x00007f18759f2000)
        libllama.so => /home/metal3d/Projects/ML/llama.cpp/build-rpc/src/libllama.so (0x00007f1875879000)
        libggml.so => /home/metal3d/Projects/ML/llama.cpp/build-rpc/ggml/src/libggml.so (0x00007f1875400000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f1875000000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f187531c000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f187582b000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1874e0f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f18759f4000)
        libvulkan.so.1 => /lib64/libvulkan.so.1 (0x00007f18757af000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f18752c6000)
rgerganov commented 1 month ago

The Vulkan backend is using the tensor->extra property which is not supported by the RPC backend. There is the same issues with the SYCL backend (PR #7682)

xvim commented 1 week ago

is any plan to support vulkan when using RPC backend?

rgerganov commented 1 week ago

I will try to find out how to avoid using tensor->extra in Vulkan. Maybe adding a global map ggml_tensor -> ggml_tensor_extra_gpu

slaren commented 1 week ago

The extras in the Vulkan backend are not really necessary, all the data that they contain is already present (directly or indirectly) in other fields of the tensor. At this point I think they are only there for legacy reasons, but could be removed with a refactor.