LibreQoE / LibreQoS

A Quality of Experience and Smart Queue Management system for ISPs. Leverage CAKE to improve network responsiveness, enforce bandwidth plans, and reduce bufferbloat.
https://libreqos.io/
GNU General Public License v2.0
454 stars 48 forks source link

Alternative Network Typologies #9

Closed mjsteckel closed 1 year ago

mjsteckel commented 2 years ago

Hi,

I haven't deployed LibreQoS yet, but am exploring it w/ great interest after Dave Taht alerted me to it. I operate a WISP and after reviewing the docs & code suspect that our network topology might be outside of what LibreQoS can currently support. Granted I'm new to this so this so it may all be related to being a fq/LIbreQoS newbie.

Within our network, a given site may have one or more roles. (An upstream back-haul connection is a given for all sites.)

  1. P2MP Broadcast
  2. P2P relay to another site
  3. Tenant service (think apartment building, multi-dwelling building or MDU)

We currently have multiple sites with all combinations of 1, 2, and 3.

Out data center provides services to all of our sites, and in our case only has P2P links (no P2MP).

Two things that are unclear to me from reading the LibreQoS docs and code.

A) Does LibreQoS support customers in an MDU? The current implementation seems to presume that all customers receive service via an AP, most likely a P2MP AP. B) We provision customers in MDUs using PPPoE. While I can't imagine this makes a difference as long as we can specify the customer IP for shaping purposes.

Thanks

rchac commented 2 years ago

A) It certainty can, just consider the MDU site as an "AP" and set that bandwidth limit for that MDU based on the back haul connection serving the MDU (fiber, p2p). This works assuming you have 4Gbps or less flowing through the MDU (the approximate per-CPU-core shaping limit of most CPUs). B) LibreQoS, similarly to Preseem and most middle-boxes, acts as a transparent bridge. It inspects and manipulates traffic going from your network core router to network edge router, but does not actively participate using OSPF or other routing protocols (unnecessary). Assuming your PPPoE sessions terminate at your core, this should work fine.

mjsteckel commented 2 years ago

Re PPPoE: For us, PPPoE sessions terminated on the MDU router(s).

We have one or two routers in every MDU. The in-buildings routers are PPPoE servers (forwarding authentication to Radius servers at the data center).

The MDU routers establish the public customer IP. with routing handled by OSPF "redistribute connected".

rchac commented 2 years ago

Ah ok gotcha. Sorry, I had misused a term - I meant to ask if your PPPoE concentrator was on (or downstream of) the network core router.

So just to clarify:

1) Does your network have a separate core (aggregation) and edge (firewall, NAT) router? 2) If so, are these PPPoE MDUs routed through your core / aggregation router on their way out of your network edge? I'm imagining this: End-User <-> MDU PPPoE router / server <-> P2P to DC <-> Aggregation Router / Core <-> Edge Router <-> Upstream ISPs

If the answer to both questions is yes, then PPPoE would not be a concern to my understanding. Even if MTU is <1500 the packets should be identified and parsed correctly.

mjsteckel commented 2 years ago
  1. No. We only have what we call our "core" routers than do both BGP and OSPF. We have two and they operate as an active/active pair.

Our customers are assigned public IPv4 address and their traffic is not NAT'ed. (Devices on out private network, with 10/8 IPs are NAT'ed but then they are only making requests for ntp and dns.)

Note that our current topology does not play nice with the LibreQoS/Preseem model that requires a transparent bridge...

  1. From your description I suspect that PPPoE traffic will work just fine.
rchac commented 2 years ago

Ah ok gotcha. The best bet would be to split out network functions between edge and core, but I completely understand that can be a pain to do and is not always practical giving space constraints. Alternatively, if your core routers run MikroTik, you could make API calls to them to dynamically create / refresh fq-codel based shapers for each client. ROS v7 is hopefully just a few months from being at a point where it's stable enough to do so.

mjsteckel commented 2 years ago

Is there any reason why LibreQoS can not run on our combined edge/core routers?

Conversely, what is the reasons for having separate edge and core routers?

Note, we have nearly zero firewall rules and the ones we have are either to a) block IPs/subnet that abusively scan our network and customers b) suspend service for non-paying customers.


No MikroTik in our network so far and don't expect it add it. Current core routers are a pair of HP DL-380 G6s running CentOS 7. These are soon to be upgraded to a pair of HP DL-380 G8, with more (and faster) cores and a bunch more NICs. The new routers will likely run VyOS.

Most other routers in our network are Ubnt EdgeRouters or some sort. However, due to growing frustration with EdgeOS we are moving way from EdgeRouters at our major sites. We are replacing them with small SuperMicro servers running VyOS. See: https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-9D-8CN8TP.cfm

rchac commented 2 years ago

LibreQoS could definitely run downstream of your Core routers, the only problem is that xdp-cpumap-tc, which LibreQoS uses to match packets, does not support VLAN tagging. I'm guessing that south of your core routers is an aggregation switch, correct? If VLANs are used between the core router and aggregation switch, LibreQoS won't correctly match packets. However, if VLANs are not used, it would most likely work without any redesign.

mjsteckel commented 2 years ago

Sigh... We use VLANs.

mjsteckel commented 2 years ago

No VLANs... So any idea if xdp-cpumap-tc works with bonded interfaces?

rchac commented 2 years ago

Darn. I'm sorry. It may work with bonded interfaces but I'm not completely sure. https://github.com/xdp-project/xdp-cpumap-tc

mjsteckel commented 2 years ago

From a very quick qrep of the xdp-cpumap-tc code, I see references to disabling VLAN offloading, but nothing about not working with VLAN altogether.

dtaht commented 2 years ago

https://github.com/xdp-project/xdp-tutorial/blob/master/packet01-parsing/README.org#assignment-4-adding-vlan-support

mjsteckel commented 2 years ago

The vlan header structure is already in xdp_iphash_to_cpu_kern.c & tc_classify_kern.c

struct vlan_hdr {
        __be16 h_vlan_TCI;
        __be16 h_vlan_encapsulated_proto;
};

And while proto_is_vlan() is not defined as outlined in assignment 4, it's functionality is in the same files as above. See:

https://github.com/xdp-project/xdp-cpumap-tc/blob/888cc7712f2516d386a837aee67c5b05bd04edfa/src/tc_classify_kern.c#L144

        **/* Handle VLAN tagged packet */
        **if (eth_type == bpf_htons(ETH_P_8021Q) ||
            eth_type == bpf_htons(ETH_P_8021AD)) {****
                struct vlan_hdr *vlan_hdr;

                vlan_hdr = (void *)eth + offset;
                offset += sizeof(*vlan_hdr);
                if ((void *)eth + offset > data_end)
                        return false;
                eth_type = vlan_hdr->h_vlan_encapsulated_proto;
        }
        **/* Handle double VLAN tagged packet */
        if (eth_type == bpf_htons(ETH_P_8021Q) ||
            eth_type == bpf_htons(ETH_P_8021AD)) {**
                struct vlan_hdr *vlan_hdr;

                vlan_hdr = (void *)eth + offset;
                offset += sizeof(*vlan_hdr);
                if ((void *)eth + offset > data_end)
                        return false;
                eth_type = vlan_hdr->h_vlan_encapsulated_proto;
        }
dtaht commented 2 years ago

"Is there any reason why LibreQoS can not run on our combined edge/core routers?"

The xdp dependency would hurt you here. We'd fought for years to make htb scale before this concept arrived, and certainly the bump in the wire approach is helpful in many cases.

However vyatta does have fq_codel, and, I think, cake, so something equivalent could be constructed on those routers, which i consider highly desirable as well - I care that people understand how to use this stuff on any platform, and I'm interested primarily in how to improve the underlying infrastructure that they need the construct "fits" to their topologies and business models. The mikrotik thread exposed MPLS as a problem that I hadn't thought about much. The UBNT "smart queues" implementation was contributed by the userbase 7? years back, and then forked into vyatta. Trying to explain your uses to the vyatta userbase and their toolkit on their forums is also a great idea.

A version that leveraged per customer DRR + cake might be good on vyatta. You'd need a ton of filters (for ipv6) and or a veth device with a bunch of routes on it....

Let a thousand lovely smart queue management systems blossom!

mjsteckel commented 2 years ago

Thanks for the detailed response.

As I've been exploring this all day I think the only manageable way to approach it is with a middle box/transparent bridge.

Yes, it would mean re-configuring our physical and logical network topology at the data center. Not complicated, just a bunch of very detailed oriented work to make sure the transition is smooth.

The edge and core routers would be VyOS and just run the latest Debian on the middle box.

marsalans commented 2 years ago

Is there any reason why LibreQoS can not run on our combined edge/core routers?

Conversely, what is the reasons for having separate edge and core routers?

Note, we have nearly zero firewall rules and the ones we have are either to a) block IPs/subnet that abusively scan our network and customers b) suspend service for non-paying customers.

No MikroTik in our network so far and don't expect it add it. Current core routers are a pair of HP DL-380 G6s running CentOS 7. These are soon to be upgraded to a pair of HP DL-380 G8, with more (and faster) cores and a bunch more NICs. The new routers will likely run VyOS.

Most other routers in our network are Ubnt EdgeRouters or some sort. However, due to growing frustration with EdgeOS we are moving way from EdgeRouters at our major sites. We are replacing them with small SuperMicro servers running VyOS. See: https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-9D-8CN8TP.cfm

Dear @mjsteckel how many customers your are connecting to your hp box and how much data it crosses ?

interduo commented 2 years ago

No MikroTik in our network so far and don't expect it add it. Current core routers are a pair of HP DL-380 G6s running CentOS 7. These are soon to be upgraded to a pair of HP DL-380 G8, with more (and faster) cores and a bunch more NICs. The new routers will likely run VyOS.

VyOS don't support DPDK

marsalans commented 2 years ago

No MikroTik in our network so far and don't expect it add it. Current core routers are a pair of HP DL-380 G6s running CentOS 7. These are soon to be upgraded to a pair of HP DL-380 G8, with more (and faster) cores and a bunch more NICs. The new routers will likely run VyOS.

VyOS don't support DPDK

But it support xdp in their latest release (Unstable)

dtaht commented 2 years ago

I wish we could reprioritze, rather than close? I LIKED vyatta a lot (it is also the root of the edgeos in ubnt's work). But they missed how best to autoconf cake, and a lot of their customers (such as @mjsteckel ) have gear deployed that just needs better shaping in general, and better statistics, in general.

thebracket commented 2 years ago

On my projects, I use tags for that. I have a "distant future" tag for things I don't want to worry about yet, as well as the usual "bug", "feature request", etc. tags. Makes searching easy.

On Thu, Oct 20, 2022 at 11:53 AM Dave Täht @.***> wrote:

I wish we could reprioritze, rather than close? I LIKED vyatta a lot (it is also the root of the edgeos in ubnt's work). But they missed how best to autoconf cake, and a lot of their customers (such as @mjsteckel https://github.com/mjsteckel ) have gear deployed that just needs better shaping in general, and better statistics, in general.

— Reply to this email directly, view it on GitHub https://github.com/rchac/LibreQoS/issues/9#issuecomment-1285868649, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRU434Y7A3ZGF7TQ3B2GX3WEF2HNANCNFSM5NUHITFA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dtaht commented 2 years ago

I live and die by emacs org-mode. So does toke. GTD is deeply embedded in me.

rchac commented 2 years ago

Sorry I was trying to clean up the many open issues. I saw that the original issue was about whether LibreQoS supports the different site roles mentioned. It does indeed support all of the roles inquired about since we introduced v1.1. Also the topic is stale. For those two reasons I closed the issue.

Looking over it again I see that the subject changed to how it is necessary to rework mjsteckel's network to have an edge/core router setup.

I will be sure to tag issues instead of closing.

That said I'm not sure exactly what the current issue on this thread is.

dtaht commented 2 years ago

@mjsteckel can I coax you to get into the beta on some subset of your network? We're taking a poll on vyatta support elsewhere, but so far you are the only one requesting it.

mjsteckel commented 2 years ago

@dtaht Until VyOS v1.4 is released which will hopefully include a new enough kernel, it does not make sense for LibreQOS to consider supporting VyOS. So thanks for thinking of me, but put this on the back burner for consideration sometime down the road.

I want to deploy LibreQOS asap, but unfortunately we have a bunch that has to get done first. The biggest technical hurdle is that we need to separate our network edge routers running BGP from our internal routing functionality which is (obv) needed to support a middle box shaper.

From a project standpoint, our priorities are to an complete upgrade to a major tower site, update equipment on data center roof, and finally install new routers at the data center. The dc routers were initially only going to be a hw upgrade, but now it will also include adding routers to separate functionality. All this is happening while we are growing significantly...

dtaht commented 2 years ago

Is there a specific repo over here we could file a bug against and track against this milestone? Vyos is a MUCH bigger project now than I remember (which is awesome!) https://github.com/vyos

mjsteckel commented 2 years ago

The VyOS web site use to have a link to their roadmap. I can no longer find the link but found the roadmap, but with the caveat that I have no idea if it is still updated/managed. See: https://portal.productboard.com/vyos/1-vyos-roadmap/tabs/2-planned

The link is to the planned features which includes support for cake.

dtaht commented 2 years ago

I bugged @vyos about this via twitter, and email.

dtaht commented 1 year ago

@mjsteckel can you give us an update on vyos's status?

dtaht commented 1 year ago

@marsalans - you still with us?

marsalans commented 1 year ago

@marsalans - you still with us?

yes

marsalans commented 1 year ago

Actually i tuned Mikrotik for 10 Gbps and stopped offloading it to LibreQoS, now in few months company has ordered huawei routers so they will replace Mikrotik

mjsteckel commented 1 year ago

@mjsteckel can you give us an update on vyos's status?

(Was away for the weekend... and just got back) VyOS deployment will happen very soon. Having said that, this issue had nothing to do with VyOS (though is was discusses in comments).

Totally fine by me to close this issue. I had a very limited understanding of lqos when I created the issue and was just trying to learn/understand how lqos functioned.