Tracking issue: UISP integration and complex setups, relative to BracketOoS

thebracket commented 2 years ago

Setting this up as a tracking issue while I poke at UISP integration a bit. My intent is to gradually fix these issues and offer them up to LibreQoS (rather than just handing out my Rust-based tool and keeping it updated separately).

My "playpen" for working on this is here: https://github.com/thebracket/LibreQoS/tree/uisp-integration . My intent is to tackle issues there, and then turn them into merge-able PRs for Libre. Oh, and I grabbed a copy of this book from my publisher (I get e-books super-cheap since I work for them) and re-learned Python. :-)

One of the areas that took a lot of work with BracketQoS was getting our UISP setup to work with it. We run a mixed vendor network, with UISP handling billing/CRM even for the parts that are running Cambium, Mimosa, Mikrotik and a few others. When I run the 1.3 UISP integration script against our network:

Complained that I didn't really mean uispBaseUrl, because UISPbaseURL was imported (fixed in #136)
Crashed because we had a couple of devices that weren't associated yet (fixed in #137)
Confused me by asking me if I wanted router or device IP tracking, but changing the value turned out to not do anything (fixed in #138 )
ignoring subnets didn't do anything (fixed in #142 )

Once those were out of the way, I still ran into some issues:

ShapedDevices.csv only contains 59 devices. It should contain several hundred. (See https://github.com/rchac/LibreQoS/compare/main...thebracket:LibreQoS:uisp-integration for work in progress)
network.json shows a single node. The site it picked is one that doesn't contain any devices and isn't connected to the rest of the tree (long story short, someone jumped the gun and added it before we install the hardware in the coming weeks).
Many of our devices have more than one IP, but the current integration only looks at the one UISP picks as "management". That's particularly problematic for us, since we have a LOT of CPEs with separate management and traffic IPs.
There's no subnets, which may be tricky and require some manual intervention. We have a few spots where a customer has an entire subnet, and we shape them collectively with a single speed limit in a single queue. For example, we provide lobby WiFi at a public housing facility. The lobby is handing out IPs in the range 100.64.20.0/24. I've been using the Trie support on our setup to lump them together (so we don't run NAT at the site, it goes to our egress NAT - making for a much faster/happier router and better queueing).
It looks like allowedSubnets is processed, but ignoredSubnets is not. We'll need that if we start processing all of the IPs found on devices, since we have about 600 different 192.168.15.0/24 subnets NATed at the customer. (fixed in #142 )

Niceties I'd like to try and arrange:

Some choice of topology. Bracket lets you pick "flat" (every customer parented off the root), "AP only" (APs are a top layer), "Site only" (sites are top level entries and every customer feeds off of the site) and "full" (which builds a complete topology graph between sites and maps the entire network).
The bane of my existence, relays always break topology. (A "relay" being a customer fed via another customer). BracketQoS occasionally fails on these. I swear my colleagues come up with new and interesting topologies to install every time I take a day off.
Suspended customers. One thing we found useful with Preseem - and ported over to our version of BracketQoS - was the ability to set a "suspended customers get this much Internet" option. We'd pick a low number, so their service sucked rather than being off altogether (helpful if they have VoIP and you don't want to cut off 911, and if your "pay your bill!" page is offsite)

rchac commented 2 years ago

My intent is to gradually fix these issues and offer them up to LibreQoS (rather than just handing out my Rust-based tool and keeping it updated separately).

Thank you!

Some choice of topology. Bracket lets you pick "flat" (every customer parented off the root), "AP only" (APs are a top layer), "Site only" (sites are top level entries and every customer feeds off of the site) and "full" (which builds a complete topology graph between sites and maps the entire network).

That makes sense. I think adding a "flat" option would be great.

The bane of my existence, relays always break topology. (A "relay" being a customer fed via another customer). BracketQoS occasionally fails on these. I swear my colleagues come up with new and interesting topologies to install every time I take a day off.

My solution has been to create a UISP site for each repeater PoP and have the host household as a client of that site. It's flexible and allows operators to have complex relays with multiple APs and such. Is this a reasonable workaround? If not we can try to have it better accommodate these relay site cases.

Suspended customers. One thing we found useful with Preseem - and ported over to our version of BracketQoS - was the ability to set a "suspended customers get this much Internet" option. We'd pick a low number, so their service sucked rather than being off altogether (helpful if they have VoIP and you don't want to cut off 911, and if your "pay your bill!" page is offsite)

Hm, I just assumed suspension would be handled separately (we do redirect to payment portal via MikroTik) so I excluded suspended subscribers from even being shaped. This makes sense and wouldn't be that hard to implement. I think this is a good idea.

dtaht commented 2 years ago

Hilariously, I run out of bandwidth on celluar all the time, they actually rate limit it to about 2Mbits with sane buffering, and with cake in the way on my usb tether, I hardly notice. videoconference still "just work", web pages get slow, but I don't use the web much.

thebracket commented 2 years ago

Suspension is an odd one. We work with a third-party who provide VoIP to some of our customers, and they were pretty insistent on allowing 911 calls even if the Internet service is suspended. So we do the redirect also, but only for web traffic. (@dtaht would be able to do most things that weren't the web, and is smart enough to open a VPN... we don't block that, right now, so he'd have free service until our installer shows up for the gear... it's not perfect, but it's working)

The "site" model for relays is how you should do it, and we used to do it that way. We have something like 75 site-to-site relays now, and it became really unwieldy. So we have a bunch of client sites linked to other client sites. It's pretty ridiculous, but if I don't support it I get grumbles from down the hall...

A funny one. So a non-profit gets a big circuit from us. Easy - client site off of a tower. They realize that they really should be two non-profits and put up a building on the same site - which just happens to be inaccessible due to terrain. So now there's a relay from charity 1 to charity 2. Initially in the same client site because Charity 1 wanted to pay for it all. Of course, time passes and Charity 1 is complaining that Charity 2 are using all their bandwidth so they've agreed to pay for their own. No biggie, now Charity 2 is a client site - with its own bandwidth tracking. Another charity (they tend to cluster) sets up shop next to Charity 2, and want a relay too. So now Charity 1 has a site with 3 client sites coming off of it. And it just keeps going. There's something like 5 charities, 2 of the manager's houses, a church and a barn all linked up - sometimes daisy chained. Ugh.

Edit: forgot to mention that they are all in a bowl-shaped valley with conservation department rules prohibiting tower construction.

rchac commented 2 years ago

I feel you there, building towers is pretty much a no-go where we are thanks to zoning, though we are considering OTARD hub towers to skirt around that. Tower construction limitations make these complex repeater setups inevitable. Given how many existing sites are already set up in UISP like that for your network, let's accommodate them going forward. =)

dtaht commented 2 years ago

I think cake so saves your bacon on each hop here... but I imagine it is all nat hell?

thebracket commented 2 years ago

Not really NAT hell. There's a router at each site with links to other sites, with a "customer" port that provides connectivity to the customer. The routers relay DHCP requests from each router (adding option 82 data on the way) to ensure that whatever gets plugged into the customer's port receives the correct public IP.

I really should open source our "make option 82 work with UISP" setup, one day. In any client site, we setup an "other" device with a MAC address (equal to the port providing service's MAC), the name "Service IP" and the intended IP address as the device's address. A program periodically reads UISP and builds a DHCP configuration (ye olde isc-dhcpd) and hot-reloads it when it changes. Combine that with Bracket assigning queues to the customer and it's really seamless. Whatever the customer plugs in gets the right IP, and is shaped appropriately. There's even a small pool of IPs for each area into which "we've no record of you existing" devices get dumped (with short lease times) and redirect to a page reminding our installer to finish the process.

dtaht commented 2 years ago

What y'all do is so different than my second generation attempt in 2008. I wish I'd published it. I had had great pain in PPPoe in my first generation network, and said screw it, used static IPv6/48 as my underlying transport, allowed service or not based on the underlying radio MAC address, tunneled ipv4 under that, and split bandwidth up evenly (or so I thought) via SFQ. It was a minimum amount of service (5mbit) up to whatever was available, flat rate (well, I soaked the gringos and intended to subside the schools).

Was all you can eat, no complicated shaping needed. The cpe did their own dhcp for ipv4. Of course, no billing systems or decent shaping systems existed at the time either!

thebracket commented 2 years ago

In my testbed, commit https://github.com/thebracket/LibreQoS/commit/5b57b9a8017b111377fee88a42df6ffa091d227d contains a bit more work on this:

I've got ignoredIPs doing something.
I've added some wrappers to make it easier for me to reason what's going on.
A good start on a flat topography.

dtaht commented 1 year ago

@rchac @thebracket it looks like you have covered most of this. What haven't you covered?

thebracket commented 1 year ago

BracketQoS obsessively puts every single infrastructure device into the device list, shared as per-site "infrastructure" entries. It might be worth porting that. Otherwise, the current implementation is better than the original BracketQoS setup.

On Sun, Mar 19, 2023, 10:46 AM Dave Täht @.***> wrote:

@rchac https://github.com/rchac @thebracket https://github.com/thebracket it looks like you have covered most of this. What haven't you covered?

— Reply to this email directly, view it on GitHub https://github.com/LibreQoE/LibreQoS/issues/140#issuecomment-1475298462, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRU436YREIMUPQPD46K2PTW44S3ZANCNFSM6AAAAAARLKVRNQ . You are receiving this because you were mentioned.Message ID: @.***>

thebracket commented 1 year ago

"Infrastructure" items (which may or may not be a good idea) and a good support-oriented long-term stats retention are the only remaining items on this. I don't think either is a 1.4 issue, changing the milestone.

bile0026 commented 1 year ago

+1 for "suspension" feature. Must have for my network.

LibreQoE / LibreQoS

Tracking issue: UISP integration and complex setups, relative to BracketOoS #140