dtaht / sch_cake

Out of tree build for the new cake qdisc
101 stars 35 forks source link

Overhead on ppp corrected -14 as if eth. #51

Closed AndyFurniss closed 7 years ago

AndyFurniss commented 7 years ago

Hi been testing cake master on a ppp interface with a view to working out ptm overheads on UK VDSL2.

The working out overheads failed as it turned out that the modem does its own backed off QOS using rate tables with >>2 and strangely sometimes >>3 banding on packet size.

I did notice that using the overhead parameter needs +14 adding as it seems that cake treats ppp like eth and assumes that packet length is seen as +14, but on ppp tc sees ip length.

chromi commented 7 years ago

On 20 Feb, 2017, at 02:36, AndyFurniss notifications@github.com wrote:

Hi been testing cake master on a ppp interface with a view to working out ptm overheads on UK VDSL2.

The working out overheads failed as it turned out that the modem does its own backed off QOS using rate tables with >>2 and strangely sometimes >>3 banding on packet size.

I did notice that using the overhead parameter needs +14 adding as it seems that cake treats ppp like eth and assumes that packet length is seen as +14, but on ppp tc sees ip length.

Looking at Cake’s detailed stats (tc -s qdisc), what is the reported maximum packet length? If it is 1514, that indicates that Cake is, in fact, being given Ethernet-framed packets, as the maximum size of IP packets is (in most environments) 1500 bytes.

Cake always tries to specify overheads relative to an IP packet without additional framing. This additional framing happens to be given by an interface parameter in Linux, which I assumed is accurate. If it is not, then I may need to revisit.

If you’re certain that it’s wrong, then you can manually specify the overhead (without needing to know that it’s a 14-byte difference) using “raw overhead N”. The “raw” keyword turns off the use of that interface parameter.

AndyFurniss commented 7 years ago

1500 is the max I see in stats (it's pppoe with mini jumbos).

I will later treble check that the issue is real.

I forgot to put last night that I am running 4.1.36 on my router box and had to modify code to get it to build.

diff --git a/sch_cake.c b/sch_cake.c index 17f8618..3a4c0e1 100644 --- a/sch_cake.c +++ b/sch_cake.c @@ -69,7 +69,7 @@

include <net/netfilter/nf_conntrack.h>

endif

-#if (KERNEL_VERSION(4,4,11) > LINUX_VERSION_CODE) || ((KERNEL_VERSION(4,5,0) <= LINUX_VERSION_CODE) && (KERNEL_VERSION(4,5,5) > LINUX_VERSION_CODE)) +#if 0

define qdisc_tree_reduce_backlog(_a,_b,_c) qdisc_tree_decrease_qlen(_a,_b)

endif

On the raw parameter: I did try that but the test failed and I noticed that in the output of tc -s qdisc ls raw wasn't mentioned when used with overhead - I assumed it was exclusive as raw is mentioned without overhead.

It seems however, that raw has an issue. Testing on my desktop now, current git kernel no hacks needed to build on an eth interface.

tc qdisc add dev enp6s0 handle 1:0 root cake bandwidth 2mbit raw diffserv4

reliably gets more tcp throughput as measured by netperf than

tc qdisc add dev enp6s0 handle 1:0 root cake bandwidth 2mbit diffserv4

Which is the opposite of what I expected.

AndyFurniss commented 7 years ago

Hmm, I am a bit confused about what raw is supposed to do. Testing on eth again. root [~]# tc qdisc del dev enp6s0 root root [~]# tc qdisc add dev enp6s0 handle 1:0 root cake bandwidth 2mbit diffserv4 root [~]# tc -s qdisc ls dev enp6s0 qdisc cake 1: root refcnt 2 bandwidth 2Mbit diffserv4 triple-isolate rtt 100.0ms noatm overhead 14

So overhead 14 "appeared" and testing with time deltas from tcpdump I am (as expected I guess) under rate for 1500 byte ip length calculation.

root [~]# tc qdisc del dev enp6s0 root root [~]# tc qdisc add dev enp6s0 handle 1:0 root cake bandwidth 2mbit raw diffserv4 root [~]# tc -s qdisc ls dev enp6s0 qdisc cake 1: root refcnt 2 bandwidth 2Mbit diffserv4 triple-isolate rtt 100.0ms raw

I expect here that raw means calculations are done for 1514 - but tcpdump tells me that packets are released exactly correctly as if calculation was done on 1500 bytes.

ldir-EDB0 commented 7 years ago

Are you using a 'latest/match' cake aware tc? cake/tc recently gained more awareness of linux' additional frame overhead and if that info is being used, tc reports 'via-ethernet'. The fact you're not seeing that in either case makes me think there's a cake/tc generation mismatch. e.g. tc -s qdisc show dev eth0 qdisc cake 800a: root refcnt 2 bandwidth 19600Kbit diffserv3 dual-srchost nat rtt 100.0ms noatm overhead 12 via-ethernet

So that's actually a total of 26 bytes (14 ip overhead that linux knows about as part of the skb) and another 12 I (in theory) know about - which is something like 5 PTM, 4 FCS, 4 VLAN (yes I know I can't add..no idea!) And that's for a 'router ethernet interface to HG612 modem on BTs VDSL2 infrastructure with Sky's ISP...which is more 'ethernet in the last mile' than BT..so it does DHCP rather than running yet another PPPoE encapsulation (so your overheads could well be another 8 bytes higher)

AndyFurniss commented 7 years ago

andy [iproute2-cake]$ git remote -v origin git://kau.toke.dk/cake/iproute2/ (fetch) origin git://kau.toke.dk/cake/iproute2/ (push) andy [iproute2-cake]$ git pull Already up-to-date

Is that the right one?

I am plodding on slowly tring to work out things - it's a bit trickier testing on my ppp as there is often other traffic and the timers don't seem as accurate as on my desktop.

Current feeling maybe there is a 22 byte difference that sometimes needs to be corrected for - it's early days I may be wrong. Repeating the above test that makes an overhead "appear" gives 22 on my ppp0 - which is I guess is correct for pppoe. The issue may be that it's getting accounted for when I try to specify a manual overhead. This would mean the title of this issue is incorrect and the 14 needs replacing with 22. As I said though, testing on ppp for packet release delta times is a bit tricky.

On the PTM overheads you give, I never could see where 5 came from - I've stared at the spec and only managed to see 4. In fact in the UK there is a BT sin with an example that only works out with 4 for PTM + 4 VLAN (after reducing sync rate * 64/65). Downstream it is not possible to test for me as BT limit transmission rate < sync even allowing for overheads. Upstream I was hopeful but my unlocked HG612 is way off and is doing its own QOS below ptm (explicit ip QOS if turned off). I know this as LCP pings don't get lagged out.

The way it does this looks like cell log lookup as I said above.

ldir-EDB0 commented 7 years ago

AFAIK https://github.com/dtaht/tc-adv is the up to date 'official' - I can't remember where 5 came from..which is probably why my actual system has it at 4. Doh!

moeller0 commented 7 years ago

On Feb 20, 2017, at 19:34, AndyFurniss notifications@github.com wrote:

andy [iproute2-cake]$ git remote -v origin git://kau.toke.dk/cake/iproute2/ (fetch) origin git://kau.toke.dk/cake/iproute2/ (push) andy [iproute2-cake]$ git pull Already up-to-date

Is that the right one?

I am plodding on slowly tring to work out things - it's a bit trickier testing on my ppp as there is often other traffic and the timers don't seem as accurate as on my desktop.

Current feeling maybe there is a 22 byte difference that sometimes needs to be corrected for - it's early days I may be wrong. Repeating the above test that makes an overhead "appear" gives 22 on my ppp0 - which is I guess is correct for pppoe. The issue may be that it's getting accounted for when I try to specify a manual overhead. This would mean the title of this issue is incorrect and the 14 needs replacing with 22. As I said though, testing on ppp for packet release delta times is a bit tricky.

On the PTM overheads you give, I never could see where 5 came from - I've stared at the spec and only managed to see 4. In fact in the UK there is a BT sin with an example that only works out with 4 for PTM + 4 VLAN (after reducing sync rate * 64/65). Downstream it is not possible to test for me as BT limit transmission rate < sync even allowing for overheads. Upstream I was hopeful but my unlocked HG612 is way off and is doing its own QOS below ptm (explicit ip QOS if turned off). I know this as LCP pings don't get lagged out.

The way it does this looks like cell log lookup as I said above.

for what it is worth here is my reading of the spec:

VDSL2 (IEEE 802.3-2012 61.3 relevant for VDSL2): 1 Byte Start of Frame (S), 1 Byte End of Frame (Ck), 2 Byte TC-CRC (PTM-FCS), = 4 Byte add: COMMON: 4 Byte Frame Check Sequence (FCS) + 6 (dest MAC) + 6 (src MAC) + 2 (ethertype) = 18 byte and potentially 2 Byte PPP + 6 Byte PPPoE + 4 Byte VLAN for a total of 4 + 18 + 12 = 34. Note that with typical MTU 1500 this reduces to 1500 + 34 - 8 = 1526 on the wire size, but with baby jumbos you really should shape with 1534. Now it the situation in the UK is any way like in DE the ISP will have a shaper at its BRAS/BNG leel and hence cake needs to be adjusted for that shaper’s values and not the VDSL2 link (assuming the ISP is competent enough to actually configure the BRAS/BNG shaper correctly). Also note that the presence of the FCS basically means that on the VDSL link there is an effective MPU of 64 bytes as I can not believe that a VDSL modem would take a 64 frame with padding, remove the paddibf recalculate the FCS, on send and undo the whole thing on send…

Best Regards Sebastian

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

moeller0 commented 7 years ago

Hi Kevin,

On Feb 20, 2017, at 19:50, Kevin Darbyshire-Bryant notifications@github.com wrote:

AFAIK https://github.com/dtaht/tc-adv is the up to date 'official' - I can't remember where 5 came from..which is probably why my actual system has it at 4. Doh!

Maybe from the VDSL1 spec: VDSL1 (see G.993.1 Annex H): 1 Byte Opening Flag Sequence (OFS), 1 Byte Address Field (AF), 1 Byte Control Field (CF), 2 Byte TC-CRC (PTM-FCS), 1 Byte Closing Flag Sequence (CFS might be omitted with i back-2-back PTM frames) = 5 - 6 (effectively 5 since 6 only would happen if there were idele octets, so not under full rate traffic conditions). Also VDSL1 used HDLC with byte stuffing, in my layman’s eyes PTM is so much simpler to understand that I am thankful to the ITU gods that they put both ATM and VDSL1 to rest for the comparatively easy to wrap one’s head around PTM… but I digress Also, fot some time I was confused and actively posted the VDSL1 numbers to one of our mailing lists, so in the end might be to blame for you recollection...

Best Regards Sebastian

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

AndyFurniss commented 7 years ago

Thanks for the replies and link to the correct iproute2 (ugh, don't know how I ended up with the wrong one)

Things are somewhat saner with that - though still a bit confusing for the unwary. The output does though, show what is going on.

So raw is default now -

tc qdisc del dev ppp0 root tc qdisc add dev ppp0 handle 1:0 root cake bandwidth 2mbit diffserv4 tc -s qdisc ls dev ppp0 qdisc cake 1: root refcnt 2 bandwidth 2Mbit diffserv4 triple-isolate rtt 100.0ms raw

rates measure at ip level.

So I want ip +34 and do

tc qdisc del dev ppp0 root tc qdisc add dev ppp0 handle 1:0 root cake bandwidth 2mbit overhead 34 diffserv4 tc -s qdisc ls dev ppp0 qdisc cake 1: root refcnt 2 bandwidth 2Mbit diffserv4 triple-isolate rtt 100.0ms noatm overhead 34 via-ethernet

via-ethernet has appeared which gives me a clue I am not going to get ip + 34 Try adding raw overhead 34 = success

tc qdisc del dev ppp0 root tc qdisc add dev ppp0 handle 1:0 root cake bandwidth 2mbit raw overhead 34 diffserv4 tc -s qdisc ls dev ppp0 qdisc cake 1: root refcnt 2 bandwidth 2Mbit diffserv4 triple-isolate rtt 100.0ms noatm overhead 56 via-ethernet

overhead has been adjusted + 22 so via-ethernet means different things depending on interface.

Quick test = seems OK vs calculated transmit time, but timings a bit variable on this CPU vs my desktop so harder to tell.

ldir-EDB0 commented 7 years ago

It may be worth tweaking the 'via-overhead' display to report the amount of overhead the kernel already knows about:

AndyFurniss commented 7 years ago

Yea, it would be handy to see what's going on more explicitly.

On previous discussion on overheads. I assumed FCS would be sent, but I've seen calculations in an official UK BT/openreach doc that imply it isn't - but they may be wrong.

If they are right then my huawei is even further off expected than the measurements I made.

The straight line on the graphs is calculated tcp using ip +34 as overhead. There is no qos between me and the modem. The points are netperf results from a script that does a 30 second run to flent-london.bufferbloat.net decrease MTU then run again.

https://drive.google.com/drive/folders/0BxP5-S1t9VEENFNLeFhmeHljMW8?usp=sharing

Not really worth me messing with PTM overheads when faced with this :-(

I am going to dig out my locked down ECI modem soon and see what that is like.

moeller0 commented 7 years ago

On Feb 20, 2017, at 22:53, AndyFurniss notifications@github.com wrote:

Yea, it would be handy to see what's going on more explicitly.

On previous discussion on overheads. I assumed FCS would be sent, but I've seen calculations in an official UK BT/openreach doc that imply it isn't - but they may be wrong.

Well, according to ITU G992-3_minusAnnexC_2009-04-final.pdf Annex N it is possible to use shorter than 64Byte packets over PTM, but also says: “NOTE 4 – If the PTM-TC carries IEEE 802.3 (Ethernet) packets, it is assumed that the preamble and SFD fields have been discarded by the PTM entity before transmitting the packets to the PTM-TC. See clause 61.1.4.1.2 of [IEEE 802.3]."

I interpret this as the FCS is still part of the packet. But the ITU standards are decidedly mumm about the FCS. However the IEEE’s 802.3-2012 Section 5 has: “61.3.3.3 TC-CRC functions The TC-CRC is generated for the entire payload fragment including any attached header (from PAF), including the Ethernet CRC; i.e., the TC-CRC is computed over octets from the first octet of the PAF header (if present), or the first octet of the DestinationAddress (in the case where the PAF header not present), to the last octet of the Ethernet CRC (for a frame) or the last octet of the fragment (if PAF fragmentation is operating), inclusive. ..."

Again, this pretty much sounds like the FCS is included (note the PAF seems to be a method to fragment ethernet frames, that still covers the FCS). Also I have a hard time understanding how not transferring the FCS offers anything to an ISP (it just wastes some bandwidth… while excluding it forces one to recalculate it at the other side of the ptm link). I admit though I have no real data to back me up (I wonder whether there is a vdsl2 modem that allows access to the PTM bit stream?)

If they are right then my huawei is even further off expected than the measurements I made.

The straight line on the graphs is calculated tcp using ip +34 as overhead. There is no qos between me and the modem. The points are netperf results from a script that does a 30 second run to flent-london.bufferbloat.net decrease MTU then run again.

https://drive.google.com/drive/folders/0BxP5-S1t9VEENFNLeFhmeHljMW8?usp=sharing

Wahoo, that is odd, especially this 4 byte frequency/pumping...

Not really worth me messing with PTM overheads when faced with this :-(

Well, in spite of the 4 byte cycling there over all is a decent correlation between the straight line anf the flent measurements, it is just that it would be nice to understand where the offset is coming from, no?

I am going to dig out my locked down ECI modem soon and see what that is like.

I would be delighted if you would keep me in the loop, this is fascinating.

Thanks & Best Regards

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

AndyFurniss commented 7 years ago

That does indeed read like FCS is included.

The 4 byte pumping I guess is just how they implemented their QOS, TC IIRC does the same for some shapers = shift packet size right and match to a bin with a pre-calculated time so some packets get shaped like they are bigger. I can't explain why it's sometimes 8 in the graphs though.

Maybe the offset is because this is bespoke teleco issued kit and I've just re-found the doc that shows the calculation that doesn't seem to add up if FCS is included. Nearby it mentions uploading other data once a day. Page 7 http://www.sinet.bt.com/sinet/SINs/pdf/498v7p3.pdf As I said earlier I know there is QOS going on for ppp LCP echo requests, no matter how much I flood the modem buffer (120 ms with 1500 ip len) with IP these are not affected. Co-incidently my ISP now give out different all in one routers, and a recent firmware upgrade on those broke this behavior. There is a long thread on their forum where people on the lower 2mbit upload package they sell, loose their ppp session when doing big uploads as the ISP PPP kit thinks they are down.

I'll post back when I've tested my other ECI modem - though this is also teleco supplied so may well be reserving some bandwidth/doing QOS - hopefully "better" than the Huawei.

moeller0 commented 7 years ago

Hi Andy,

On Feb 21, 2017, at 01:24, AndyFurniss notifications@github.com wrote:

That does indeed read like FCS is included.

The 4 byte pumping I guess is just how they implemented their QOS, TC IIRC does the same for some shapers = shift packet size right and match to a bin with a pre-calculated time so some packets get shaped like they are bigger. I can't explain why it's sometimes 8 in the graphs though.

Maybe the offset is because this is bespoke teleco issued kit and I've just re-found the doc that shows the calculation that doesn't seem to add up if FCS is included. Nearby it mentions uploading other data once a day. Page 7 http://www.sinet.bt.com/sinet/SINs/pdf/498v7p3.pdf

IMHO, they seem confused, they state: “R.ETH.1 The modem shall support an Ethernet frame size of between 68 and 1534 bytes. For clarity, this figure includes 4 bytes for the C-VLAN, and excludes bits allocated to pre-amble, Inter-Frame Gap, and Frame Check Sequence at the user network interface (UNI). Support for frame sizes above 1534 bytes (inclusive of C-VLAN) is not guaranteed.” but looking at https://en.wikipedia.org/wiki/Ethernet_frame it seems clear that the 68 byte number is the minimal ethernet frame size with a VLAN tag that _includes_ the FCS. In addition I can not make any sense out of the numbers shown on page 7, no matter how I slice and dice this (so most likely I am missing something here)

As I said earlier I know there is QOS going on for ppp LCP echo requests, no matter how much I flood the modem buffer (120 ms with 1500 ip len) with IP these are not affected.

I assume you create the ppp tunnel somewhere and send the data through the pppoe device? In that case I would assume the pppoe device would simply inject the pppoe LLC packets with priorety and be done with (alternatively the VLN PCPs could be used to mark the LLC/LCP packets as precious).

Co-incidently my ISP now give out different all in one routers, and a recent firmware upgrade on those broke this behavior. There is a long thread on their forum where people on the lower 2mbit upload package they sell, loose their ppp session when doing big uploads as the ISP PPP kit thinks they are down.

Mmmh, interesting if the fix foe the problem could be scrutinized that might allow to figure out how openreach intends to special-case llc packets.

I'll post back when I've tested my other ECI modem - though this is also teleco supplied so may well be reserving some bandwidth/doing QOS - hopefully "better" than the Huawei.

Please do, some consider this to be quite dry, but I can not help myself to be intrigued by it. And I am of the opinion that for proper shaping one really needs to know what is really crossing the wires ;)

Best Regards

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

AndyFurniss commented 7 years ago

I couldn't get a clear enough line to do a lot of plots so far, but here's one for now. No pumping on the ECI modem - still off a bit from expected. https://drive.google.com/drive/folders/0BxP5-S1t9VEELVItbnRwM2dCZGM?usp=sharing

I don't know what the LCP fix was for the ISP supplied router, testing with the ECI it doesn't seem to prio LCP like the huawei. The buffer on both seem to be pfifo but the ECI is much smaller at my capped 20mbit sync I could lag ping up to 120ms on the huwawei more like 25-30ms on the ECI.

On my setup WRT PPP, the modems are bridged to eth so I set up ppp0 on my separate router PC. ppp only sees ip traffic, but I can of course see all the ppp frames including LCP by looking on the eth. As far as the modem is concerned it's just getting pppoe frames (it will also eat all multicast/broadcast IP packets as well). I hope I didn't make some horrendous power of 10 error when looking at the LCP time deltas and concluding the Huaweu was doing QOS!

On the calculation (unless I messed up) I can get 8 as the fixed overhead with the assumption that the examples were rounded up to whole kbit. I agree this seems not to match anything I van measure - maybe who ever did it just forgot about FCS. The bras rate set for me is certainly less than that would imply. I sync at 65mbit and the rate for that sync shows (on their web checker) as 62.5mbit.

I think (!) the logic I used was to calculate packets/sec using the example net rate and size, then use that figure to see how big the packets would need to be to fill the sync rate * 64/65. It was a while ago but this is what I pasted into my notes -

for X in $(seq 35720600 100 35721400); do echo "scale=10;((40000000 * (64/65)) / 8) / ($X / 8 / 78)" | bc; done

for X in $(seq 39177600 100 39178400); do echo "scale=10;((40000000 * (64/65)) / 8) / ($X / 8 / 1514)" | bc; done

AndyFurniss commented 7 years ago

testing with the ECI it doesn't seem to prio LCP like the huawei

Turns out this was a false assumption. Testing with a tcp upload + pinging at the same time, plus observing LCP showed the LCP was delayed the same as the ping.

Testing with flent udp_flood gives quite different results = ping is 300ms + loss, but LCP is only lagged by 40ms, so I think the ECI is doing something that protects LCP.

AndyFurniss commented 7 years ago

So I have come to the conclusion that testing with tcp and netperf is not the best way to go.

Using netem to add 5ms in/out and testing to netserver on my gig eth lan I see the same delta between expected and actual. I don't see it without netem. The Huw stepping graphs are I believe valid - just they should be shifted up a bit.

The above udp_flood test was invalid because it defaults to 100M so it was buffering in my switch (gig but modem connected at 100M). Interesting and repeatable observation that the switch port when flooded delays LCP less than ICMP, both being wrapped in pppoe frames. I still think the modem is protecting LCP - no matter how hard I try (using lower udp rates) I haven't got any drops.

Moving on to putting cake on ppp0 and testing with udp_flood and it's clear that for this ECI modem overhead 34 is the correct value. With 33 latency just rises through the test, with 34 it seems to be spot on (though I haven't done this test with various packet sizes). When I say spot on it really is - every minute my upnp server blats out some multicast which bypasses the ppp with my current sub-optimal network setup. At my up rate this is expected to add a few ms - and it shows nicely on a ping graph and the "delta" doesn't drain, which implies to me that the shaping is accurate (well, for the short period of the test at least).

AndyFurniss commented 7 years ago

I tried a few other packet sizes with iperf as I couldn't work out how to change udp packet size with flent. Results seem good on those tests as well. I notice while testing that the flent servers don't even have iperf enabled, but it still just floods out udp regardless :-)

Without any qos I can send 70 meg udp at the modem and ping at 50 ppps without loss, so it seems the modem is doing some sort of fq without backing off from raw ptm rate.

I guess this issue should really be closed as the description is not really accurate and it's turned into OT discussion that should really be on the mailing list. If you agree please close.

dtaht commented 7 years ago

OK.