feat: vulkan. - Githubissues

The experimental Vulkan support for the matrix multiplication.

To try run Distributed Llama with the Vulkan support you need clone this branch. Also you need to have installed Vulkan dev environment to compile shaders and Distributed Llama with the -lvulkan lib.

Build Distributed Llama:
```
make dllama DLLAMA_VULKAN=1
```

Run Distributed Llama with the --accelerator=1/1 argument.

./dllama inference --accelerator 1/1 \
--buffer-float-type q80 --prompt "Hello" --steps 128 --nthreads 1 --model models/llama3_8b_q40/dllama_model_llama3_8b_q40.m \
--tokenizer models/llama3_8b_q40/dllama_tokenizer_llama3_8b_q40.t

The value for this argument defines "what percent of the computation should be moved to GPU". 1/1 means 100%. 1/2 means 50% etc.

The current implementation tries to run the inference on CPU and GPU simultaneously. You can still control how many threads should be used by setting the --nthreads 4 argument. So basically the goal it to find the best values for --nthreads argument and the --accelerator 1/3 argument, to achieve the best speed.

unclemusclez@ttv:~/distributed-llama/src/vulkan$ ls
matmul-f32-f32.comp  matmul-q40-f32.spv   matmul-q40-q80.spv
matmul-f32-f32.spv   matmul-q40-q80.comp
matmul-q40-f32.comp  matmul-q40-q80.sp
unclemusclez@ttv:~/distributed-llama/src/vulkan$ cd ../..
unclemusclez@ttv:~/distributed-llama$ make dllama DLLAMA_VULKAN=1
Makefile:55: warning: overriding recipe for target 'funcs-test'
Makefile:29: warning: ignoring old recipe for target 'funcs-test'
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ut                                                                                                                                                             ils.cpp -o utils.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/qu                                                                                                                                                             ants.cpp -o quants.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/fu                                                                                                                                                             ncs.cpp -o funcs.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/co                                                                                                                                                             mmands.cpp -o commands.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/so                                                                                                                                                             cket.cpp -o socket.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/tr                                                                                                                                                             ansformer.cpp -o transformer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ta                                                                                                                                                             sks.cpp -o tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ll                                                                                                                                                             ama2-tasks.cpp -o llama2-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/gr                                                                                                                                                             ok1-tasks.cpp -o grok1-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/mi                                                                                                                                                             xtral-tasks.cpp -o mixtral-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/to                                                                                                                                                             kenizer.cpp -o tokenizer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ap                                                                                                                                                             p.cpp -o app.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ac                                                                                                                                                             celerator-vulkan.cpp -o accelerator-vulkan.o
src/accelerator-vulkan.cpp: In constructor 'VulkanContext::VulkanContext()':
src/accelerator-vulkan.cpp:85:37: error: format '%llu' expects argument of type                                                                                                                                                              'long long unsigned int', but argument 3 has type 'vk::DeviceSize' {aka 'long un                                                                                                                                                             signed int'} [-Werror=format=]
   85 |             printf("🌋 heap[%u]: %llu MB\n", h, memoryProperties.memoryH                                                                                                                                                             eaps[h].size / (1024 * 1024));
      |                                  ~~~^
      |                                     |
      |                                     long long unsigned int
      |                                  %lu
cc1plus: all warnings being treated as errors
make: *** [Makefile:17: accelerator-vulkan.o] Error 1

b4rtaz / distributed-llama

feat: vulkan. #91