Closed b4rtaz closed 6 days ago
unclemusclez@ttv:~/distributed-llama/src/vulkan$ ls
matmul-f32-f32.comp matmul-q40-f32.spv matmul-q40-q80.spv
matmul-f32-f32.spv matmul-q40-q80.comp
matmul-q40-f32.comp matmul-q40-q80.sp
unclemusclez@ttv:~/distributed-llama/src/vulkan$ cd ../..
unclemusclez@ttv:~/distributed-llama$ make dllama DLLAMA_VULKAN=1
Makefile:55: warning: overriding recipe for target 'funcs-test'
Makefile:29: warning: ignoring old recipe for target 'funcs-test'
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ut ils.cpp -o utils.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/qu ants.cpp -o quants.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/fu ncs.cpp -o funcs.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/co mmands.cpp -o commands.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/so cket.cpp -o socket.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/tr ansformer.cpp -o transformer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ta sks.cpp -o tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ll ama2-tasks.cpp -o llama2-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/gr ok1-tasks.cpp -o grok1-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/mi xtral-tasks.cpp -o mixtral-tasks.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/to kenizer.cpp -o tokenizer.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ap p.cpp -o app.o
g++ -std=c++11 -Werror -O3 -march=native -mtune=native -DDLLAMA_VULKAN -c src/ac celerator-vulkan.cpp -o accelerator-vulkan.o
src/accelerator-vulkan.cpp: In constructor 'VulkanContext::VulkanContext()':
src/accelerator-vulkan.cpp:85:37: error: format '%llu' expects argument of type 'long long unsigned int', but argument 3 has type 'vk::DeviceSize' {aka 'long un signed int'} [-Werror=format=]
85 | printf("🌋 heap[%u]: %llu MB\n", h, memoryProperties.memoryH eaps[h].size / (1024 * 1024));
| ~~~^
| |
| long long unsigned int
| %lu
cc1plus: all warnings being treated as errors
make: *** [Makefile:17: accelerator-vulkan.o] Error 1
Unfortunately the approach applied in this PR is not a good direction. I have to revise it.
The experimental Vulkan support for the matrix multiplication.
To try run Distributed Llama with the Vulkan support you need clone this branch. Also you need to have installed Vulkan dev environment to compile shaders and Distributed Llama with the
-lvulkan
lib.--accelerator=1/1
argument.The value for this argument defines "what percent of the computation should be moved to GPU".
1/1
means 100%.1/2
means 50% etc.The current implementation tries to run the inference on CPU and GPU simultaneously. You can still control how many threads should be used by setting the
--nthreads 4
argument. So basically the goal it to find the best values for--nthreads
argument and the--accelerator 1/3
argument, to achieve the best speed.