Hurricos / realtek-poe

5 stars 10 forks source link

realtek-poe: Reduce number of port setup packets #19

Closed mrnuke closed 2 years ago

mrnuke commented 2 years ago

In response to #18, try to be more efficient with port setup.

Use 4 port commands efficiently

Most commands used for configuring port parameters can set up tp four ports per packet. Only setting one port port and setting all other fields to 0xff is woefully inefficient. How ineffiecient? On a 24-port switch, each command could be sent in 6 packets instead of 24.

To achieve this, pack 4 port IDs and values in each command, and skip unused or disabled ports.

I had considered that unpacking an array of structures, to pack into two arrays to then repack into one array is "woefully inefficient". The CPU and RAM is 5 orders of magnitude faster than the serial link, and I was going for the least unreadable approach. I'm open to improvements as long as it reduces the line count.

Use port ID 0x7f (all ports) where supported

The set_port_disconnect_type and set_port_detection_type commands can configure all ports in one go if passed a port_id of 0x7f. On 24-port switches, using this approach significantly reduces the number of packets sent during configuration.

There is no anticipated need to change the config on a per-port basis. Only send one of each poe_cmd_port_disconnect_type() and poe_cmd_port_detection_type() commands, with a port ID of 0x7f.

Signed-off-by: Alexandru Gagniuc mr.nuke.me@gmail.com

svanheule commented 2 years ago

Instead of explicitly building 4-port commands, have you considered coalescing frames in poe_cmd_queue()?

The queueing function could check a list of command IDs that can be coalesced. If there is nothing queued, a frame can be sent immediately. Otherwise, you could:

  1. check the last queued frame (or the entire list) for (frames with) the same ID,
  2. check if the already queued frame has room available for pooling ports,
  3. de-queue the old frame, update, and re-queue
  4. queue the shortened new frame, or drop it if all ports were coalesced into the old one.

Just a thought, not sure if it would make the code shorter or easier to parse.

mrnuke commented 2 years ago

@svanheule, I have not considered such approach until you brought it up.

Algorithm complexity

Searching lists is O(n), so we'd need to implement a look-up table to find the last queued command of a specific ID in O(1).

Race conditions

At a low level, this makes sense, but how does this interact with the application layer? Say someone issues two "poe reload" commands in rapid succession. This is not that unusual when someone edits the config and saves often. procd sends a poe reload on config file changes.

  1. poe reload is issued
  2. first set of packets is queued
  3. poe reload is issued
  4. second set of packets is queued and coalesced with the first.

This might not be problematic from a user's perspective -- they only want the last config applied. It does, however, create a race condition leading to unpredictable behavior. How many packets get coalesced depends on how close the poe reloads are in time.

Building a bigger picture

You could solve the above by adding explicit coalescing barriers. You could take this one step further, and make the command queue a serialized descriptor of the desired config.

And herein lies the rub: You already have the big picture stored in struct config. You just need to serialize it. The next logical step is to serialize it atomically in the first place, and forego the last three points of added complexity,

We've come full circle back to where we started.

svanheule commented 2 years ago

Shouldn't a reload cause the command queue to be emptied of old frames anyway? You're parsing new settings, so there's no point in sending config that will soon be replaced. I also find it strange that you're ignoring parts of the config file on reloads... (e.g. global power budget)

mrnuke commented 2 years ago

Shouldn't a reload cause the command queue to be emptied of old frames anyway?

It depends. By the time a poe reload comes in, the daemon is most likely past config and sending port queries. If only one port is changed, it makes no sense to interrupt the flow of commands.

There is a(n evil) way to empty out the command queue:

/etc/init.d/poe restart

I also find it strange that you're ignoring parts of the config file on reloads... (e.g. global power budget)

Hehe, don't git blame me for that.

Hurricos commented 2 years ago

While I don't have a practical setup to measure how reducing the number of frames improves reliability, I am going to test raw frame counts using

realtek-poe -d 2>&1 | dd of=/dev/null status=progress bs=1

over 90s on my GS1900-24HPv1, just to get an idea of what sort of reduction we're seeing.

Before:

root@OpenWrt:/# realtek-poe -d 2>&1 | dd of=/dev/null status=progress bs=1
222055 bytes (222 kB, 217 KiB) copied, 90 s, 2.5 kB/s^C
222936+0 records in
222936+0 records out
222936 bytes (223 kB, 218 KiB) copied, 90.7382 s, 2.5 kB/s

(edited) After (tested twice):

root@OpenWrt:~# realtek-poe -d 2>&1 | dd of=/dev/null status=progress bs=1
211891 bytes (212 kB, 207 KiB) copied, 90 s, 2.4 kB/s^C
213024+0 records in
213024+0 records out
213024 bytes (213 kB, 208 KiB) copied, 90.4276 s, 2.4 kB/s
root@OpenWrt:~# realtek-poe -d 2>&1 | dd of=/dev/null status=progress bs=1
211891 bytes (212 kB, 207 KiB) copied, 90 s, 2.4 kB/s^C
213024+0 records in
213024+0 records out
213024 bytes (213 kB, 208 KiB) copied, 90.3878 s, 2.4 kB/s

Seeing that this is only a reduction in the number of frames sent during setup, and seeing as setup is finished at around the 4s mark, this is a pretty hefty reduction (close to 40%), It also seems to me that ubus call poe info becomes available sooner after starting.

Hurricos commented 2 years ago

@mrnuke Is this a draft, or can I boot it into a regular PR and merge?

mrnuke commented 2 years ago

@Hurricos , the code is complete as is. Keeping as draft -- pending further testing, just in case something really crazy happens.

mrnuke commented 2 years ago

@Hurricos, oh, I've missed your comment about extensive testing. If this didn't go up in flames, then I think we're good to go.

Hurricos commented 2 years ago

Let me test the new rebase.

Hurricos commented 2 years ago

Test looks good. It has similar issues as before: sometimes data gets "stuck" and doesn't come back:

                "lan21": {
                        "mode": "PoE",
                        "status": "Delivering power",
                },

... as opposed to:

                "lan21": {
                        "priority": 2,
                        "mode": "PoE",
                        "status": "Delivering power",
                        "consumption": 5.200000
                },

This doesn't stop power delivery, and for now, an /etc/init.d/poe restart fixes the issue, while (I think) #15 will deal with those instances without intervention.

I'm happy merging this.

Hurricos commented 2 years ago

I have tested changing configuration on ports as well; works as expected.

Merging.