DRR module not fairly scheduling flows

I believe there is an issue with the DRR module in scheduling between different flows (5-tuples). I added an IPForward module to the drr sample configuration to count the number of packets being transmitted for each flow; a snapshot of the pipeline is below. The snapshot shows that the DRR module (q) is transmitting 30018 packets/sec from flow 1 but only 19931 packets/sec from flow 2. I have observed similar imbalances in an end-to-end example with concurrent iperf (TCP and UDP) flows.

+--------+                +----------+                +-----+                +--------------+                +-------+
|  src1  |  :0 89908 0:   | rewrite0 |  :0 89905 0:   |  q  |  :0 49952 0:   |    ipfwd     |  :0 30018 0:   | sink0 |
| Source | -------------> | Rewrite  | -------------> | DRR | -------------> |   IPLookup   | -------------> | Sink  |
+--------+                +----------+                +-----+                +--------------+                +-------+
                                                        ^                      |
                                                        |                      | :1 19931 0:
                                                        |                      v
+--------+                +----------+                  |                    +--------------+
|  src2  |  :0 39953 0:   | rewrite1 |  :0 39980 0:     |                    |    sink1     |
| Source | -------------> | Rewrite  | -----------------+                    |     Sink     |
+--------+                +----------+                                       +--------------+

For reference, here is the updated configuration for the pipeline above.

import scapy.all as scapy

eth = scapy.Ether(src='02:1e:67:9f:4d:ae', dst='06:16:3e:1b:72:32')
ip1 = scapy.IP(src='192.168.1.1', dst='10.0.0.1')
ip2 = scapy.IP(src='192.168.2.2', dst= '10.0.0.2')
udp = scapy.UDP(sport=10001, dport=10002)
payload = 'helloworld'
pkt_bytes1 = bytes(eth/ip1/udp/payload)
pkt_bytes2 = bytes(eth/ip2/udp/payload)

# Create the pipeline from two sources rewriting each to have their own
# source port and destination port respectively, the pushing them into
# the DRR such that they are two disinct flows. (flows are identified based
# on the five tuple)
src1::Source() -> Rewrite(templates=[pkt_bytes1]) \
  -> q::DRR(num_flows=4, quantum=1000)
src2::Source() ->Rewrite(templates=[pkt_bytes2]) \
  -> q

q -> ipfwd::IPLookup()

ipfwd:0 -> Sink()
ipfwd:1 -> Sink()

ipfwd.add(prefix='10.0.0.1', prefix_len=32, gate=0)
ipfwd.add(prefix='10.0.0.2', prefix_len=32, gate=1)

# drr and source uses tasks to forward packets. to throttle the forwarding rate,
# once can put a rate a limiter on the tasks associated with these modules.
# as a result of the throttle on drr, drr will fairly distribute the throughput across
# all of the flows. (In order to maximize fairness, the quantum should be tweaked)
bess.add_tc('fast', policy='rate_limit', resource='packet', limit={'packet': 90000})
src1.attach_task(parent='fast')

bess.add_tc('slow', policy='rate_limit', resource='packet', limit={'packet': 40000})
src2.attach_task(parent='slow')

bess.add_tc('medium', policy='rate_limit', resource='packet', limit={'packet': 50000})
q.attach_task(parent='medium')

I am still in the process of debugging this issue further but was able to reliably replicate with a small number (< 5) flows. Using gdb with a breakpoint on DRR::GetNextFlow, I noticed that in some cases, one (but not necessarily all) of the disadvantaged flows will be missing from the flow queue, although they must make it back into the queue at some point since they are receiving some service. I haven't yet determined the sequence of events to reproduce this behavior though.

Any further suggestions for debugging? Thank you in advance.

Some output from gdb of what I was describing. I put a breakpoint where the quantum is added to each flow's deficit (drr.cc:255) and printed the flow ids. There are 5 active (iperf) flows in this test but the 5th flow seems to be missing (source ports are unique for each flow).

Thread 5 "bessd" hit Breakpoint 2, DRR::GetNextFlow (this=this@entry=0x7f146400f900, err=err@entry=0x7f147bf958ec) at modules/drr.cc:255
warning: Source file is more recent than executable.
255     f->deficit += quantum_;
(gdb) p f->id
$1 = {src_ip = 3232235778, dst_ip = 167772162, src_port = 56811, dst_port = 5201, protocol = 6 '\006'}
(gdb) c
Continuing.

Thread 5 "bessd" hit Breakpoint 2, DRR::GetNextFlow (this=this@entry=0x7f146400f900, err=err@entry=0x7f147bf958ec) at modules/drr.cc:255
255     f->deficit += quantum_;
(gdb) p f->id
$2 = {src_ip = 3232235780, dst_ip = 167772164, src_port = 49625, dst_port = 5201, protocol = 6 '\006'}
(gdb) c
Continuing.

Thread 5 "bessd" hit Breakpoint 2, DRR::GetNextFlow (this=this@entry=0x7f146400f900, err=err@entry=0x7f147bf958ec) at modules/drr.cc:255
255     f->deficit += quantum_;
(gdb) p f->id
$3 = {src_ip = 3232235779, dst_ip = 167772163, src_port = 32923, dst_port = 5201, protocol = 6 '\006'}
(gdb) c
Continuing.

Thread 5 "bessd" hit Breakpoint 2, DRR::GetNextFlow (this=this@entry=0x7f146400f900, err=err@entry=0x7f147bf958ec) at modules/drr.cc:255
255     f->deficit += quantum_;
(gdb) p f->id
$4 = {src_ip = 3232235782, dst_ip = 167772166, src_port = 55952, dst_port = 5201, protocol = 6 '\006'}
(gdb) c
Continuing.

Thread 5 "bessd" hit Breakpoint 2, DRR::GetNextFlow (this=this@entry=0x7f146400f900, err=err@entry=0x7f147bf958ec) at modules/drr.cc:255
255     f->deficit += quantum_;
(gdb) p f->id
$5 = {src_ip = 3232235778, dst_ip = 167772162, src_port = 56811, dst_port = 5201, protocol = 6 '\006'}

Is there a reason why this implementation does not use a linked list as an active queue? I only ever see one thread executing this code, but I don't have a fuller understanding of BESS's execution model. I ask because the llring data structure seems clunky in the way it's currently being used as an active queue (e.g., requiring extra next packet pointers).

Good questions; the DRR module is a bit experimental, and I don't know that I can answer your questions about it. Maybe @TheJStone, who wrote it, could weigh in?

Although clunky, Llring was better optimized for this application. I am not sure how much has changed since I worked on this almost a year ago. if the sample code and tests are still fair, then this issue might be resolved by tweaking the quanta. From my personal experience, the fairness was heavily dependent on the quanta chosen as mentioned in the sample file. If the sample code and tests are not fair, then I am not sure what the issue might be without digging into it further. The module was not rigorously tested for the effects on fairness from various pipeline and I could see an egress bottleneck might have adverse affects including an increased drop rate that might not necessarily be fair.

I ran those the DRR module tests, and they pass. I guess I'm not sure how to test whether the sample code produces a fair output because it terminates in a Sink, so I can't measure per-flow throughput. The configuration I used above was my attempt to produce a test of fairness for the sample configuration. It is simply the DRR sample code (with the same quanta) with an added IPFoward module to split the flows and thus measure the per-flow throughput.

I will try implementing a simple RR module in the configuration above to see if that yields the expected, fair throughput.

NetSys / bess

DRR module not fairly scheduling flows #772