b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

dllama-api hosted on 127.0.0.1 #82

Open unclemusclez opened 1 month ago

unclemusclez commented 1 month ago

is there a way to change this to make it remotely available?

DifferentialityDevelopment commented 1 month ago

I don't think there is currently an argument to specify host as far as I'm aware, but it's relatively very simple to add I think.

b4rtaz commented 1 month ago

The api binds to 0.0.0.0. So the server should be visible from everywhere of your local network.