distantmagic / paddler

Stateful load balancer custom-tailored for llama.cpp 🏓🦙
MIT License
632 stars 27 forks source link

Is it possible to deploy more than 1 balancer? #9

Closed wwulfric closed 3 months ago

wwulfric commented 4 months ago

The llama-server stat is stored in balancer memory. It seems not work for multiple balancers

mcharytoniuk commented 4 months ago

@wwulfric I did not add a shared state feature (through some external storage) because I did not need it yet, but I am all for it if there is a demand.

Do you ask because you need high availability or because you exhausted all the server resources (i.e., the traffic is so high that a single Paddler instance cannot handle it)?

If you need high availability, I would approach that similarly to how you would deploy HAProxy (an additional standby Paddler instance that starts to accept traffic if a primary Paddler instance fails).

If traffic volume is the issue, then shared storage might indeed be the solution, and I will add it.


I also see another alternative option for stacking Paddler instances. They all have a /health endpoint compatible with llama.cpp, so you can set a Paddler agent that observes a Paddler instance's /health endpoint instead of llama.cpp. For example:

./paddler agent \
    # point that to a child paddler balancer reverse proxy
    --external-llamacpp-host 127.0.0.1 \
    --external-llamacpp-port 8088 \
    # point this to child paddler balancer balancer health endpoint
    --local-llamacpp-host 127.0.0.1 \
    --local-llamacpp-port 8088 \
    # point this to a parent paddler
    --management-host 127.0.0.1 \
    --management-port 8085

For example, that way you can have three Paddler instances (you can also combine them with HA standby instances). The first and second instances can manage half of your llama.cpp fleet. Your third instance can manage those two child Paddler instances. That will limit the number of reports each Paddler instance has to accept from their agents. It will add another hop in the infrastructure, though.

wwulfric commented 3 months ago

@mcharytoniuk Thank you. I am just concerned about the high availability issue.

mcharytoniuk commented 3 months ago

@mcharytoniuk Thank you. I am just concerned about the high availability issue.

No problem. In that case, I'd recommend using keepalived in front of Paddler, combined with a redundant Paddler instance on standby.