pppoe-upgrade without session drops

jkroonza commented 2 years ago

Another crazy idea, that may or may not be viable.

Given a restart signal, somehow encode data required to reconstruct state prior to an execve, such that the replacing process (which may be a newer version of pppoe-server) can reconstruct internal state, and continue on keeping existing sessions and interfaces.

For now just let this rest here to think about, this would be non-trivial to say the least and I don't see that upgrades to pppoe-server itself is something that would happen frequently.

jkroonza commented 1 year ago

@dfskoll we're debating this in the office currently, we're trying to determine MINIMUM information that needs to be carried over. So the strategy currently is:

create a pipe(2) prior to execve of the new version.
set an environment variable to indicate the FD number of the read end.
execve.
new instance parses parameters and (pending final version #16 ) loads any config scripts that may be applicable.
new instance (if environment variable is set) read lines from the pipe() that provides information on: 5.1. Interface file descriptors that passed over execve() (depending on what you did previously we may need to unset FD_CLOEXEC). Essentially interface name and file descriptor number. 5.2. Current sessions, including the following information per session: 5.2.1. Remote MAC. 5.2.2. Session Number. 5.2.3. Session Interface. 5.2.4. pppd PID
Shutting down instance would need to write above information to replacement instance.

There will be other caveats (like handling the PID file).

Is there any other information you can think off that may be needed to be carried over? There are other possible issues:

"carried over sessions" for interfaces which are not under the replacement instance's control (ie, someone dynamically added an interface to a running instance, what happens with the associated sessions)?
Other's we can't even currently begin to think of.

Are there any other "don't forget about" pearls of wisdom that you're able to impart prior to me commencing work on this?

dfskoll commented 1 year ago

@jkroonza, I think re-execing the rp-pppoe program is not a good idea. Instead, I'd update all the data structures to be dynamic. So instead of an array of ClientSessions to track sessions, make it a resizable hash table. Instead of an array of Interface objects to track interfaces, make it a hash table. I think those are the only two static global data structures, so making them dynamic should allow you to add interfaces and change the maximum number of sessions without re-execing the program.

Removing an interface or lowering the number of sessions will require care. If you lower the number of sessions below the current number of active sessions, it's probably best simply to disallow new sessions until the number of active sessions drops below the limit. For removing an interface, you could either make it fail if there are active sessions using the interface, terminate all such sessions, or delay removal until the sessions end (and disallow creation of new sessions on that interface.)

jkroonza commented 1 year ago

@dfskoll those changes are already in the pipeline on this side via mentioned PR. We also want to implement a startup-script of commands as part of that PR. The upgrade process I'm trying to figure out, might be that as part of that PR we end up implementing a mechanism to dump the current running configuration as a script, which could in theory then be what ends up being sent over the pipe() here, or even written to a temp file and then loaded in the new instance.

This pertains specifically to ugprades of pppoe-server where we'd prefer to NOT disconnect ongoing sessions. In other words, when we want to replace the currently running pppoe-server instance with a newer (or technically older or same if we should so choose) version of pppoe-server (or if some dependent library was updated and we want to ensure that the newest version of the software and libraries) are running. This process should be the exception to changing the running configuration though.

Currently that involves shutting down pppoe-server which terminates all running sessions.

The argument can be made that pppd instances too may need to be restarted and this is true, but that we can then stagger over time so that we can force reconnection of a handful of customers at a time over several hours so that if this triggers a problem we didn't anticipate we could stop the process and solve the problem whilst it's only affecting a handful of customers rather than potentially several hundred. The other (potentially valid) approach could be to take the ostrich apoach and simply not care if pppd instances are leaked (since IP allocations are delegated), at worst a sysadmin could manually kill pppd's for those leaked sessions, and if they end up sending a PADT we, since we don't know about the session simply ignore it.

This also raises another possible approach:

Spawn a new pppoe-server and set the old one to drain+exit so the old one can still manage the pppd pids for old sessions. And once they're all dead simply terminate. This may also be a valid approach.

Currently we're discussing options internally around the upgrade / replace pppoe-server mechanism (which is something we require) as well, but we are also looking for your inputs as you plainly have more experience with this than we do.

As it stands we thus see a couple of "ugprade path" options:

dump config + load config over execve (temp file or via pipe). Carrying over STATE so the original pppoe-server can terminate. Either via pipe(2) or some temp file, or hell, even an environment variable. State could include full running config, or only part thereof. We could, for example, only carry over session information, and if the new instance is not configured for a specific interface, create that interfact "on demand" but set that interface immediately to drain + deconf when it's created.
Start a fresh daemon and simply exit the parent (losing state about existing pppds, thus leaking them and if we get a PADT we don't kill the pppd's but hopefully the ppp protocol too will shut down properly from the remote side, alternatively a sysadmin can kill the pppd which is what we currently do anyway if we need to disconnect a specific session).
Start a fresh daemon and set the old one to drain + exit (so the old pppoe-server will stay alive until all it's known sessions are dead).

Options two and three are primarily problematic if ip delegation to pppd is not being used. As can be seen variations are also possible in the above paths. As it stands we need to add a bunch of interfaces again in the coming weeks, and if possible we'd like to take the opportunity to deploy at the very least the code that I'm busy trying to get ready for the PR, but if possible I would not mind also getting something live for a more sensible upgrade path going forward.

dfskoll commented 1 year ago

Well, I still think that re-execing the program is a bad idea. I would rework the PR so it doesn't require that. Not having to re-exec will make everything else vastly simpler.

dfskoll commented 1 year ago

Oh, I just saw your note about no-drop upgrades. IMO, that is a somewhat unrealistic goal. Would you try to implement no-drop kernel upgrades? The reality is that a short amount of downtime for software upgrades is pretty normal and expected. If you want to support this, you'd have to version the information passed between the old and the new program, because new versions of the server might have additional state that old versions don't know about. All-in-all, sounds vastly too complex to me.

But if you really need that, then your option (1) is the only way. And if you want to preserve parent/child relationships, you can't fork the server... you can only exec a new copy so that it is still the parent of the pppd processes and will receive SIGCHLD signals when they exit. So that means dumping state to a file and restoring it from there.

jkroonza commented 1 year ago

We realized the parent/child thing ... two options:

encode all info into an environment variable
fork(), and execve the parent.

dfskoll commented 1 year ago

What is the point of the fork? Why not just dump state to a file, execve the program, and restore state from the file?

dfskoll commented 1 year ago

Actually, there's a race condition. execve will reset the disposition of the SIGCHLD handler to SIG_DFL and if a SIGCHLD comes in before you can re-establish the SIGCHLD handler, you may miss a signal. What you'll have to do, I think, is in the newly-exec'd program, re-establish the SIGCHLD handler and then do a loop of nonblocking wait calls to pick up any child processes that exited between the execve and the re-establishment of the SIGCHLD handler.

jkroonza commented 1 year ago

That's valid, thanks for the heads up.

jkroonza commented 1 year ago

For anyone bumping into this, this can now be (to a degree) done by setting pppoe-server to drain via the control socket, and then bringing up an additional pppoe-server.

This makes a few assumptions:

You're NOT using pppoe-server to allocate IPs (-R 0.0.0.0)
There won't be session number overlaps (I believe -o should take care of this, so if you have max 256 sessions, you need to make sure what the currently running -o options are combined with their -N values).
I'm not sure if further action MAY be required on PADR side to prevent the "old" instance from still initiating sessions, this still needs to be tested.

I'm still contemplating automating the process in some way, but that's no longer high on the priority list.

dfskoll / rp-pppoe

pppoe-upgrade without session drops #13