Support Nvidia, AMD, AVX, ... (CUDA, ROCm, Vulkan and other BLAS) for windows - Githubissues

cztomsik / ava

All-in-one desktop app for running LLMs locally.

https://avapls.com

Other

400 stars 15 forks source link

Support Nvidia, AMD, AVX, ... (CUDA, ROCm, Vulkan and other BLAS) for windows #23

Open cztomsik opened 5 months ago

cztomsik commented 5 months ago

This will take some time, so this is just rough sketch for later:

windows build/binary could expect llama.dll
- could be straightforward, because zig can cross-compile so I can do this locally on my machine
zig build can download/extract a zip file from url like https://github.com/ggerganov/llama.cpp/releases/download/{short_rev}/llama-{short_rev}-bin-{blas}.zip
- where short_rev is obtained from the llama.cpp git submodule
- blas is something like win-cuda-cu11.7.1-x64 passed as -Dblas=xxx to a zig build
both .exe and .dll should be marked as artifact

After this is done, we can make a windows pipeline, with matrix for each BLAS, and hopefully, we will get a .zip file, which people can just download and run. Of course, they still need to have given BLAS installed on their system.

Deltrego commented 4 months ago

Howdy, I believe the frontend only needs to detect gpu, vRAM and cpu cores (if implemented client-server rather than integrating llama.cpp statically). One could ship multiple llama backends compiled with different flags to use gpu and/or cpu optimizations. More elegant to modularize llama so that gpu and {avx(,2,512)} support are compiled to separate dynamic libraries, but not sure if the code structure makes this difficult.