Dynamic Interface list update

jkroonza commented 2 years ago

Currently the list of PPPoE interfaces is specified on the CLI with -I.

I suggest implementing an option whereby a file with a list of interfaces can be given, one per line (ideally with an option for comments, or a space separated comment (We currently have around 70 interfaces).

Currently one could run an instance per interface, but this requires an IP Pool per instance (without #8 - which would still allow a combined pool).

The downside of instance per interface (or group of interfaces) even with #8 is higher memory use, although, I suspect this is overall fairly negligible.

jkroonza commented 2 years ago

@dfskoll I've been thinking about this one a bit. Would appreciate input on the below. Sorry for the essay, just want to make sure all bases are covered.

It seems pointers to Interfaces are kept in sessions. If interfaces could go away during this change, it may be a good option to ref-count interfaces (or somehow determine , but it also means we cannot re-allocate the array - are you in any way objected to me turning the Interface* list into a single-linked list? It means "removals" from the list are going to be a potentially slow operation, but I'm OK with that (I don't see this list being longer than a few hundred max. We're already contemplating rather grouping interfaces for different interfaces groups (eg, by FNO). That way we can somewhat trivially shut related interface groups.

For the time being with session numbers being indexes into an array I'm actually quite happy with that for the sake of performance, but in theory if we can allocate session numbers on a per interface basis (?) then we can exceed the 64k session limit, and we can then start looping session numbers on a per interface basis, there may be another benefit to turning sessions into linked lists too, in that we have a per interface list (also allows cleaning up interfaces once all sessions have terminated, or trivially killing all sessions for a relevant interface - it does slow down finding a specific session, but since we only ever expect PADTs for this - and can go directly to the correct Interface, or a child process dying (?) I don't see that as a major concern).

Currently my understanding is that the session id is per pppoe-server instance, I'm quite OK with keeping it that way for the time being (for the sake of smaller incremental changes). I believe it's possible to make this per interface (or even per remote client). This is way outside the scope of my current planning, but can definitely be done if needed.

As I understand, a session is uniquely identified by two mac addresses and the session identifier? In other words, in theory we can have an unlimited number of sessions per interface, assuming that we calculate the remote side's MAC as part of the unique identifier for a ClientSession?

In terms of "administrative" interface, I am suggesting to do the following:

Add a -i argument to pppoe-server, which similar to -p reads from a file.
If the file given to -i is executable, execute it and process it's stdout output rather than the file itself.
This processes lines of input, each line is simply an interface name, which MUST EXIST at the time of processing.
A signal is used to indicate re-processing.

Internally things gets a bit more complicated, the current structure whilst extremely simple, does have one major draw-back: Due to a pointer being kept in the ClientSession

I note there are a number of L2TP bits and pieces, which I'm not familiar with (but might need some time soon, testing to be actioned in the next month of two), and this may all influence things somehow. So please confirm the following strategy:

Split out ip pool allocation from ClientSession structures (store some index here, this would also in theory allow for >65k IPs, however, I'm not concerned with that, nor do I know if linux even supports >65k active interfaces on a system in any case). Allocate this array structure ONLY in the case where -R or -p is specified (-L we can calculate even with -l on demand). This structure would be required to contain (local_address, remote_address, ClientSession*).
Move ClientSession array to a per-interface linked-list. Allocate session numbers from ip pool if it's in use (honouring -r), otherwise, use either a random session number from the full 64k range for -r (except 0?) or a cyclic session number counter otherwise, in both cases checking that it's not yet in use (for -r simply pick another random number, for sequential, increment until available). There is a global session counter already to enforce -N.
Allow -N to be 0 for "not limited"? Since there won't be a fixed array any more the only reason I can envisage for this to still to be relevant is administrative reasons. We may later want to then rather move to a hash-bucket for finding sessions in the per-interface list.
Update the Interface* list to a linked list, with dedicated functions for adding interfaces to this list.
Implement -i option.
Implement a the signal driver, which would require: 6.1. Building on 4 to move all -I interfaces from the existing Interface list into the new Interface list. 6.2. Processing -i argument again in order to move existing dynamic interfaces from old list to new. 6.3. Creating any new Interfaces that we require. 6.4. For all interfaces not yet moved: 6.4.1. (optionally) terminate all current sessions (similar to current shut-down). 6.4.2. Mark these interfaces for deletion and put them at the front of the Interface list (faster finding later). 6.4.3. Stop responding to PADI frames for these Interfaces 6.4.3. As the ClientSession's associated with these Interfaces terminate, close them, and remove them from the Interface* list.

If you are happy with this strategy I'm quite happy to start with the implementation. We are in fair need of this, but we also don't want to start down a road that's not sustainable, and you definitely understand this stuff better than me.

jkroonza commented 2 years ago

@dfskoll

dfskoll commented 2 years ago

Moving the interface list to a dynamic structure makes sense. Whether it should be a linked list or a hash table is something that has to be decided based on the use-case. It could be that an O(N) lookup for an interface is too slow.

As for implementation, I would have the PPPoE server take a command-line option that makes it open a UNIX-domain (or TCP) socket that listens for control connections. We could then define a command protocol for changing the config. For example, to add an interface, maybe we could send the command addif IFNAME over the control socket, and to delete it, we could use delif IFNAME and the server would take care of all the necessary cleanup. Additional commands could obtain data from the server such as number of sessions, etc.

This is what I implemented for ServPOET. Clearly, we can't reuse that source, but I think the idea is sound.

Unfortunately, I have a full-time day job and also an after hours comedy career of sorts, so I have pretty much no time to work on this. I'd be happy to discuss implementation ideas with you, however, and review/comment on PRs.

dfskoll commented 2 years ago

Ah, and by "This is what I implemented for ServPOET", I meant the control socket, not a method to dynamically add or remove interfaces.

jkroonza commented 2 years ago

Thank you. I think I've got enough to get started. Unless you have objections to any of the below I'm going to push forward with this during the week.

I like that idea of a control socket, but it's considerably more work. Signal is I believe less work. Also, it enforces that you don't make a live change and then forgot to also update the config file (which is something that's been known to happen). I would implement two signals for now though (and sure we can add a control socket at some point), USR1 - reload reloadable (eg, Interface* if there is a file, and that file has been updated since the last reload), and USR2, a form of state dump which basically just outputs the current server state to a file, say /tmp/pppoe-server-${pid}-${seconds_since_epoch}.txt (this can assist with debugging probably).

Since I don't believe the Interface array is iterated frequently I believe a linked list should be adequate. I believe the event handlers already have an Interface at hand based on the file descriptor, so other than for a very short period when the reload happens I don't believe there should be performance impact, and compared to restarting the daemon ... whatever I end up doing is definitely less disruptive, as long as it doesn't take several seconds, but I believe even stalling the daemon up to two seconds or so should be just fine, and I believe this should be well sub 1s regardless.

A hash table for per-Interface ClientSession's might make more sense than for Interface*, but again, how frequently do we really need to search those (we use kernel mode and our sessions are generally long - except when Eskom does load shedding we're typically looking at several days/session)? And this can be changed if the performance isn't sufficient using a linked list, but I do like the idea of a hashtable here just to pre-empt this scenario.

Do we keep session numbers on a per interface basis, or do I make it per remote MAC? Ie, what's the key into the hash table? (packet->session) or (packet->ethHdr.h_source, packet->session)?

Do you have a preference in terms of hash function to use? Since session id's are 16-bit integers, if we only use then a simple % is probably fine, else if we include the remote MAC then that's 8 bytes, we can probably thread that as a sequence of 8 bytes with a loop, something like (R=small prime, eg 11/17/31, M=table size - also prime, say 97 or smaller, by default, but we can scale this up as the required data increases automatically say up to a maximum size of 997, every time we reach say an "average bucket size" of 16 or so entries?):

hash = 0;
for each byte as i
    hash = (hash * R + i) % M

dfskoll commented 2 years ago

On Mon, 23 May 2022 12:42:07 -0700 Jaco Kroon @.***> wrote:

Thank you. I think I've got enough to get started. Unless you have objections to any of the below I'm going to push forward with this during the week.

OK.

I like that idea of a control socket, but it's considerably more work. Signal is I believe less work.

It is, but it's also tricky. If you want to use signals, then I recommend doing as little as possible in the signal handler itself, but just setting a flag for the main loop to indicate that there was a signal. I usually use the technique of creating a pipe and having the signal handler write a byte to the write-end of the pipe, which then is part of the poll array so it wakes up the poll statement in the main loop. There's an example of this in another of my projects:

https://github.com/dfskoll/mailmunge/blob/f34be5b0869757fab82dab6e52d4f722169009f6/c/mailmunge-multiplexor.c#L1421

Also, it enforces that you don't make a live change and then forgot to also update the config file (which is something that's been known to happen).

That's true, but in order to avoid dropping existing sessions, we'd have to read the control file and then compare it to the current state, dropping interfaces that were removed and adding new ones. That seems tricky. If we just exec a new copy of pppoe-server, we'll lose all existing session information.

Since I don't believe the Interface* array is iterated frequently I believe a linked list should be adequate.

OK.

[...]

A hash table for per-Interface ClientSession's might make more sense than for Interface*, but again, how frequently do we really need to search those (we use kernel mode and our sessions are generally long

except when Eskom does load shedding we're typically looking at several days/session)? And this can be changed if the performance isn't sufficient using a linked list, but I do like the idea of a hashtable here just to pre-empt this scenario.

We search the session list when there's a PADT, I guess. That doesn't happen that often and modern hardware is fast, so you may be right... a linked list of sessions might be OK.

Do we keep session numbers on a per interface basis, or do I make it per remote MAC? Ie, what's the key into the hash table? (packet->session) or (packet->ethHdr.h_source, packet->session)?

If we have one hash table per interface, then the hash key only needs to be the session number. If we have one big hash table, then it'll have to be some interface identifier combined with the session number.

Do you have a preference in terms of hash function to use? Since session id's are 16-bit integers, if we only use then a simple % is probably fine,

If we use one hash table per interface, then yes... % is fine.

[...]

something like (R=small prime, eg 11/17/31, M=table size - also prime, say 97 or smaller, by default, but we can scale this up as the required data increases automatically say up to a maximum size of 997, every time we reach say an "average bucket size" of 16 or so entries?):

Yep, resizable hash tables are not too hard to do. There's a non-resizable implementation included with rp-pppoe in libevent/hash.c. It's used by the L2TP code (which is not included in the GPL'd version of rp-pppoe) but feel free to leverage it. If you need a hand understanding the API, I can help out. It could probably be made resizable without too much work.

Regards,

Dianne.

jkroonza commented 8 months ago

Another issue is what happens if the underlying Interface gets shut down (ip li del of a VLAN for example) whilst rp-pppoe has the socket open. Not even sure how to detect that. Busy working on another (semi related) project where this came up.

dfskoll commented 8 months ago

We should test if it's even possible to delete an interface while a socket is open. I suspect one of the three scenarios will happen:

The ip link del will fail, or
The ip link del will succeed, but the existing socket will keep working (ie, there's a reference count in the kernel that only fully deletes the interface when the last descriptor closes), or
The ip link del will succeed and subsequent operations on the socket will fail.

Only scenario (3) is actually a problem for rp-pppoe, so we should test to see if it's what happens.

dfskoll / rp-pppoe

Dynamic Interface list update #10