Open jkroonza opened 2 years ago
That's an interesting idea.
Many years ago, I wrote some proprietary code to allow clustering of PPPoE servers for the ServPoET product (https://finepoint.com/servpoet/)
This used a much more complicated form of clustering: The PPPoE servers would elect a controller and only the controller would handle PADIs. The controller would monitor the number of sessions on each server and when it received a PADI, it would send a message to the least-loaded server telling it to send a PADO. There was a heartbeat mechanism so that if the controller disappeared, the remaining servers would elect a new controller.
Unfortunately, I can't release that code, but your delayed-PADO is a good idea. While it wouldn't achieve the same level of balance as the cluster controller method, it's probably good enough.
To implement it, you'd just have to add a timer event to the event loop that fires in N ms and issues the PADO. I don't think it would be too difficult.
Even with "perfect distribution" if a bunch of clients on one node suddenly disconnects you still won't have perfect balancing. We're not after perfect, perfect is the perfect enemy of good enough.
In order to load balance between multiple nodes, and to improve balancing, it may be beneficial to delay responses to PADI frames in some way.
For example, we currently have fail-overs to Mikrotik nodes where we have an option on the pppoe-servers to have a PADO delay, specifically, as an example:
(Note the PADO delay option.)
What I'm suggesting is a "base delay" combined with a linear increment as the number of active session increase over a certain count, eg:
base delay: 500ms increment: 10ms delay sessions: 250
So the moment we hit 250 active sessions, then we start delaying PADO transmission by 500ms, when we reach 300 active session we delay 500ms + (300 - 250) * 10ms/ = 1000ms, in this way less loaded pppoe-servers should be preferred by new clients.
This can also be used to have more of an active-standby approach. Primary server never delays, backup always delays by some amount.
The one challenge here would be that we'd need a transmission queue and integrate this into an event loop somewhere to tx at the right time (or close enough to the right time).
This is extremely low priority currently, but another of those ideas I'd prefer to not lose track off.