b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

Support for GGUF files? #94

Open ravor-org opened 5 days ago

ravor-org commented 5 days ago

Hi,

I'm trying to implement a local AI for my home-assistant instance by using my 4 server KVM cluster. Unfortunately the model I want to use only comes as GGUF files (https://huggingface.co/acon96/Home-3B-v3-GGUF) for llama.cpp. Is there any way to convert these models to distributed llama? I don't have access to any current GPU for that job, so I thought I could try to distribute the model to my available compute cluster.

I'm also unsure, if the model itself is compatible at all. As far as I read, Zephyr is based on Mistral and Mixtra and should be compatible according to your README, but I'm fairly new to the AI world and it could be quite overwhelming with all the new terms. So if the question is ridiculous, tell me so. :)

BR, RaVoR

b4rtaz commented 4 days ago

Hello @ravor-org, Distributed Llama does not support GGUF format.

At the beggining I recommend to try available models via the python launch.py script.