packet drop after dscp remark

shizhenzhao commented 4 years ago

I am simulating the following network. The link capacity of (h1 s1) and (h2 s2) are 30 Mbps, and the link capacity of all the other links are all 10 Mbps. s1 lo: s1-eth1:h1-eth0 s1-eth2:s2-eth1 s1-eth3:s3-eth1 s1-eth4:s4-eth1 s2 lo: s2-eth1:s1-eth2 s2-eth2:h2-eth0 s2-eth3:s3-eth2 s2-eth4:s4-eth2 s3 lo: s3-eth1:s1-eth3 s3-eth2:s2-eth3 s4 lo: s4-eth1:s1-eth4 s4-eth2:s2-eth4 h1 h1-eth0:s1-eth1 h2 h2-eth0:s2-eth2

I use iperf to send udp traffic from h1 to h2: h2 iperf -u -s -p 5566 -i 1 > server.log & h1 iperf -u -c 10.0.0.2 -b 26M -p 5566 -t 5 --tos 0x08 > client.log

I want 8.5Mbps of the iperf traffic to go through the link (s1, s2), and split the rest of the traffic evenly between the path (s1, s4, s2) and (s1, s3, s2). We realized this routing mechanism using DSCP remark. Specifically, we set up a meter rule in s1 with rate=8500 and prec_level=1. And then, we forward the traffic with dscp=2 to the link (s1, s2), and split the traffic with dscp=4 between the path (s1, s4, s2) and (s1, s3, s2).

I was expecting no traffic loss in this case. However, iperf results showed about 27% packet loss. I checked the packet counters of s1/s2/s3/s4, and found the in packet num to s2/s3/s4 are smaller than the out packet num from s1.

My initial guess is that the switch or port buffer size is too small. I tried

set the max_queue_size of a link as a large number: This only makes things worse, because not setting max_queue_size means infinite queue size.
Increase N_PKT_BUFFERS from 256 to 2^12 in dp_buffers.c: I thought this may increase the switch buffer size. But unfortunately, it does not affect iperf results.

To reproduce this issue. Download and unzip test.zip first. (I am running the test in Ubuntu 16.04) $ ryu-manager $ sudo python exper3.py $ sudo ./exper3.sh Then, in mininet $ h2 iperf -u -s -p 5566 -i 1 > server.log & $ h1 iperf -u -c 10.0.0.2 -b 26M -p 5566 -t 5 --tos 0x08 > client.log test.zip

ederlf commented 4 years ago

I don't know if you noticed, but your topology is a ring. If your application is flooding ARP traffic it is likely that the traffic is being replicated in a loop, causing a broadcast storm.

shizhenzhao commented 4 years ago

I know it is a ring. So we set up static arp at hosts to avoid ARP flooding.

ederlf commented 4 years ago

From the first attempts:

I don't think it is a queue issue. If the split works, the load is less than the port capacity.
dp_buffers is not a port buffer, but for packets sent to the control plane.

I tried to reproduce but the iperf server simply does not output any result. So I cannot tell exactly what is the issue.

I have checked the link utilization with bwm-ng, and the distribution of traffic at s2 looks pretty close to what it should be.

bwm-ng -u bits -I s2-eth1,s2-eth3,s2-eth4

However, the traffic entering s1 and leaving s2, does not match.

bwm-ng -u bits -I s1-eth1,s2-eth2

This needs further investigation, but I'd ask you to try to execute the same test, but this time only from h1 to h2 directly through s2 without rate limiting. If the result is as expected, this might give a hint about where the problem is.

shizhenzhao commented 4 years ago

Thanks for your response. In order to reproduce the issue, we need a controller running. In my case, I used ryu. I have updated the reproducing steps.

I have the observation that the traffic leaving s1-eth2 and entering s2-eth1 does not match. I also tried sending traffic from h1 to h2 without rate limiting. If we increase the bw of (s1, s2) to 30Mbps, then the traffic leaving s1-eth2 and entering s2-eth1 matches. If we keep (s1, s2) as 10Mbps, certainly there will be packet loss, because there is no sufficient capacity.

ederlf commented 4 years ago

Thanks for testing it.

It looks like they match all three flows in s2. But indeed, observing the packet count from s1 to s2, we can see that they differ. We can also see that the load balancing done by the group is perfect.

table="1", match="oxm{in_port="1", eth_dst="00:00:00:00:00:02", eth_type="0x800", ip_dscp="2"}", dur_s="222", dur_ns="708000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", **pkt_cnt="7564",** byte_cnt="11436768", insts=[apply{acts=[out{port="2"}]}]},

[{table="0", match="oxm{in_port="3", eth_dst="00:00:00:00:00:02"}", dur_s="45", dur_ns="741000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", pkt_cnt="5424", byte_cnt="8201088", insts=[apply{acts=[out{port="2"}]}]},

{table="0", match="oxm{in_port="1", eth_dst="00:00:00:00:00:02"}", dur_s="45", dur_ns="736000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", **pkt_cnt="3657**", byte_cnt="5529384", insts=[apply{acts=[out{port="2"}]}]},

{table="0", match="oxm{in_port="4", eth_dst="00:00:00:00:00:02"}", dur_s="45", dur_ns="724000000", prio="32768", idle_to="0", hard_to="0", cookie="0x0", pkt_cnt="5425", byte_cnt="8202600", insts=[apply{acts=[out{port="2"}]}]},

So I'll look at what might be causing the reduced number of packets arrring in s2.

shizhenzhao commented 4 years ago

I checked the behavior of dscp remark with rate 8.5Mbps. The incoming rate is 26Mbps. In every second, the first 8.5Mbit of packets is forwarded to the link (s1,s2). This means that in the first 8.5/26 seconds, the incoming rate to (s1,s2) is actually 26Mbps. (The incoming rate to (s1, s2) in the rest of the second is 0.) Note that the link rate of (s1,s2) is just 10 Mbps. Will this cause packet loss?

ederlf commented 4 years ago

It is a possible reason. The rate of a flow is determined using the token bucket algorithm. So before the bucket overflows, it will not mark packets.

Therefore, all traffic is sent via s2 until the rate limit is detected by the switch.

shizhenzhao commented 4 years ago

How frequently are the tokens being added to the bucket in ofsoftswitch? Is it one second? Can we increase the frequency? This may reduce the bustyness of the dscp remarked flow

ederlf commented 4 years ago

The tokens are added every 100ms. I have experimented with a smaller time.

Also, because marking actually happens when the bucket does not have available tokens, I tried to start with a full bucket, but no success.

shizhenzhao commented 4 years ago

I conduct another two experiments: 1 I tried reducing the bandwidth and the iperf sending rate by 10X, and the packet loss disappears. 2 Based on the first experiments, I suspect that maybe ofsoftswitch does not have enough CPU to support high bandwidth. Then, I conduct the original experiment on a powerful server. Unfortunately, I see no improvement.

CPqD / ofsoftswitch13

packet drop after dscp remark #304