cztomsik / ava

All-in-one desktop app for running LLMs locally.
https://avapls.com
Other
400 stars 15 forks source link

Support Nvidia, AMD, AVX, ... (CUDA, ROCm, Vulkan and other BLAS) for windows #23

Open cztomsik opened 5 months ago

cztomsik commented 5 months ago

This will take some time, so this is just rough sketch for later:

After this is done, we can make a windows pipeline, with matrix for each BLAS, and hopefully, we will get a .zip file, which people can just download and run. Of course, they still need to have given BLAS installed on their system.

Deltrego commented 4 months ago

Howdy, I believe the frontend only needs to detect gpu, vRAM and cpu cores (if implemented client-server rather than integrating llama.cpp statically). One could ship multiple llama backends compiled with different flags to use gpu and/or cpu optimizations. More elegant to modularize llama so that gpu and {avx(,2,512)} support are compiled to separate dynamic libraries, but not sure if the code structure makes this difficult.