NetSys / bess

BESS: Berkeley Extensible Software Switch
Other
313 stars 156 forks source link

Backpressure-like behavior when backpressure is disabled #737

Open justinemarie opened 6 years ago

justinemarie commented 6 years ago

Hi Fam,

Here's the config. I have two containers connected to BESS with Ports 0 and 1.

[Vport/PortInc 1] -> Queue -> [Vport/PortOut 0] (Forward traffic) [Vport/PortInc 0] -> [Vport/PortOut 1] (Return traffic).

I set a rate limit from the Queue -> PortOut of say, 10Mbps.

I would expect to see the queue overflow when I send too much traffic in, e.g., using perf or a ping flood between the containers. But, the queue never overflows. Instead, I see the sending rate decrease and no drops.

I cannot for the life of me figure out what is happening! I have tried setting the rate limit in packets, Mbps, and cycles. I have tried using perf + tcp, perf + udp, and a ping flood. I consistently see the same behavior. If I replace [PortInc 1] with a Source module I do get drops, but if I use VPort + PortInc, I see this weird backpressure-like behavior, even though backpressure is turned off.

melvinw commented 6 years ago

Hi, trying to recreate!

I've got a bessctl script that I think mimics what you've described (replacing perf/ping with a Source, but keeping both PortIncs/PortOuts):

bess.add_worker(wid=1, core=1) # tried with both schedulers
bess.add_tc('rl', policy='rate_limit', wid=1, resource='bit', limit={'bit': int(10e6)})

p1 = VPort(loopback=1, rxq_cpus=[1])
p2 = VPort(loopback=1, rxq_cpus=[1])

pinc1::PortInc(port=p1)
pinc2::PortInc(port=p2)

pout1::PortOut(port=p1)
pout2::PortOut(port=p2)

src::Source()
q::Queue()
q.attach_task(parent='rl')

src -> pout1
pinc1 -> pout2
pinc2 -> q -> pout1

TC tree looks like:

<worker 1>
  +-- !default_rr_1            round_robin
      +-- rl                   rate_limit          10.000 Mbps
      |   +-- !leaf_q:0        leaf
      +-- !leaf_pinc1:0        leaf
      +-- !leaf_pinc2:0        leaf
      +-- !leaf_src:0          leaf

But I don't see the backpressure effect

$ bin/bessctl daemon start -- run file ~/backpressure.bess -- monitor pipeline
+--------------+                +--------------+
|    pinc1     |                |    pout2     |
|   PortInc    |  :0 56869 0:   |   PortOut    |
| vport0/VPort | -------------> | vport1/VPort |
+--------------+                +--------------+
+--------------+                +--------------+                +--------------+                   +--------+
|    pinc2     |                |      q       |                |    pout1     |                   |  src   |
|   PortInc    |  :0 56646 0:   |    Queue     |  :0 14845 0:   |   PortOut    |  :0 70907167 0:   | Source |
| vport1/VPort | -------------> |   991/1024   | -------------> | vport0/VPort | <---------------- |        |
+--------------+                +--------------+                +--------------+                   +--------+

+--------------+                +--------------+
|    pinc1     |                |    pout2     |
|   PortInc    |  :0 56484 0:   |   PortOut    |
| vport0/VPort | -------------> | vport1/VPort |
+--------------+                +--------------+
+--------------+                +--------------+                +--------------+                   +--------+
|    pinc2     |                |      q       |                |    pout1     |                   |  src   |
|   PortInc    |  :0 56458 0:   |    Queue     |  :0 14848 0:   |   PortOut    |  :0 71105358 0:   | Source |
| vport1/VPort | -------------> |   991/1024   | -------------> | vport0/VPort | <---------------- |        |
+--------------+                +--------------+                +--------------+                   +--------+

+--------------+                +--------------+
|    pinc1     |                |    pout2     |
|   PortInc    |  :0 56236 0:   |   PortOut    |
| vport0/VPort | -------------> | vport1/VPort |
+--------------+                +--------------+
+--------------+                +--------------+                +--------------+                   +--------+
|    pinc2     |                |      q       |                |    pout1     |                   |  src   |
|   PortInc    |  :0 56442 0:   |    Queue     |  :0 14890 0:   |   PortOut    |  :0 71006082 0:   | Source |
| vport1/VPort | -------------> |  1023/1024   | -------------> | vport0/VPort | <---------------- |        |
+--------------+                +--------------+                +--------------+                   +--------+

Are things running on separate cores in your setup? I can't measure that with this script, but I'll try with real NICs/containers in the AM.

rware commented 6 years ago

Not sure if things are running on separate cores, but here's the BESS config I'm using that's causing the error (the BESS_QUEUE_SIZE is set to something greater than 0 so the "if" part is always called):

import os

print($BESS_NIC_PCI)
print($BESS_IPADDR_MASK)
print($BESS_QUEUE_SIZE)
print($BESS_QUEUE_SPEED)

pmdport = PMDPort(pci=$BESS_NIC_PCI)
#vport = VPort(ifname='bess0', ip_addrs=[$BESS_IPADDR_MASK])
vport = VPort(ifname=$BESS_IFNAME, ip_addrs=[$BESS_IPADDR_MASK])

# check if environment variables to say how to use Queue
if int($BESS_QUEUE_SIZE!'0') > 0:
    btl_queue = Queue(size=int($BESS_QUEUE_SIZE))
    btl_queue.set_burst(burst=1)
    bess.add_tc('bit_limit',
                policy='rate_limit',
                resource='bit',
                limit={'bit': 1000000*int($BESS_QUEUE_SPEED!'1')}) # how many mbits, default is 1
    btl_queue.attach_task(parent='bit_limit')

    pmd_in = PortInc(port=pmdport.name)
    vport_in = PortInc(port=vport)
    pmd_in -> PortOut(port=vport)
    vport_in -> btl_queue -> PortOut(port=pmdport.name)
else:
    PortInc(port=pmdport.name) -> PortOut(port=vport)
    PortInc(port=vport) -> PortOut(port=pmdport.name)

os.system('sudo ifconfig {} broadcast {}'.format($BESS_IFNAME, $BESS_BCAST))

Not sure if the differences between what you have and what I have matter.

The error also happens "most of the time", not all the time so you may have to run things a few times before you see the weird behavior.

justinemarie commented 6 years ago

Hi Melvin,

Thanks for looking into this.

Justine

melvinw commented 6 years ago

I didn't have proper containers handy last night, so I substituted them with a Source and the pair of loopback VPorts. I figured they would just act as an extreme version of iperf. Will try with @rware 's script and real containers next.

melvinw commented 6 years ago

Aha! Have managed to reproduce with a variant of samples/vport_ping. Will dig around for the cause now

import os

bess.add_worker(wid=1, core=1)
bess.add_tc('rl', policy='rate_limit', wid=1, resource='bit', limit={'bit': int(1e6)})

os.system('docker pull ubuntu > /dev/null')
os.system('docker run -i -d --net=none --name=vport_test ubuntu /bin/bash > /dev/null')

# Alice lives outside the container, wanting to talk to Bob in the container
v_bob = VPort(ifname='eth_bob', docker='vport_test', ip_addrs=['10.255.99.2/24'])
v_alice = VPort(ifname='eth_alice', ip_addrs=['10.255.99.1/24'])

PortInc(port=v_alice) -> q::Queue() -> PortOut(port=v_bob)
PortInc(port=v_bob) -> PortOut(port=v_alice)
q.attach_task(parent='rl')

bess.resume_all()

os.system('sudo ping -W 1.0 -c {} -i 0 10.255.99.2'.format(2**20))

bess.pause_all()

os.system('docker kill vport_test > /dev/null')
os.system('docker rm vport_test > /dev/null')

bess.reset_all()
localhost:10514 $ show tc
<worker 1>
  +-- !default_rr_1            round_robin
      +-- rl                   rate_limit          1.000 Mbps
      |   +-- !leaf_q:0        leaf
      +-- !leaf_port_inc0:0    leaf
      +-- !leaf_port_inc1:0    leaf
localhost:10514 $ monitor tc
Monitoring traffic classes: rl, !leaf_port_inc0:0, !leaf_port_inc1:0, !default_rr_1, !leaf_q:0

13:08:08.409548            CPU MHz   scheduled        Mpps        Mbps  pkts/sched    cycles/p
----------------------------------------------------------------------------------------------
W1 rl                        1.386        1024       0.001       0.999       1.000    1353.345
W1 !leaf_port_inc0:0      1048.036    11798448       0.001       0.999       0.000 1023474.589
W1 !leaf_port_inc1:0      1050.609    11798449       0.001       0.999       0.000 1025993.384
W1 !default_rr_1          2100.032    23597928       0.003       2.998       0.000  683612.376
W1 !leaf_q:0                 1.386        1023       0.001       0.999       1.000    1353.396
----------------------------------------------------------------------------------------------

13:08:09.411741            CPU MHz   scheduled        Mpps        Mbps  pkts/sched    cycles/p
----------------------------------------------------------------------------------------------
W1 rl                        1.365        1023       0.001       0.999       1.000    1333.407
W1 !leaf_port_inc0:0      1048.203    11797955       0.001       0.999       0.000 1023879.322
W1 !leaf_port_inc1:0      1050.463    11797934       0.001       1.000       0.000 1025101.418
W1 !default_rr_1          2100.031    23596912       0.003       2.998       0.000  683776.300
W1 !leaf_q:0                 1.365        1023       0.001       0.999       1.000    1333.353
----------------------------------------------------------------------------------------------
localhost:10514 $ monitor pipeline
+--------------+               +--------------+               +--------------+
|  port_inc0   |               |      q       |               |  port_out0   |
|   PortInc    |  :0 1023 0:   |    Queue     |  :0 1024 0:   |   PortOut    |
| vport1/VPort | ------------> |    1/1024    | ------------> | vport0/VPort |
+--------------+               +--------------+               +--------------+
+--------------+               +--------------+
|  port_inc1   |               |  port_out1   |
|   PortInc    |  :0 1024 0:   |   PortOut    |
| vport0/VPort | ------------> | vport1/VPort |
+--------------+               +--------------+
justinemarie commented 6 years ago

Thank you, Melvin!

Sent from my rotary phone.

On Dec 21, 2017 13:12, "Melvin Walls" notifications@github.com wrote:

Aha! Have managed to reproduce with a variant of samples/vport_ping. Will dig around for the cause now

import os

bess.add_worker(wid=1, core=1) bess.add_tc('rl', policy='rate_limit', wid=1, resource='bit', limit={'bit': int(1e6)})

os.system('docker pull ubuntu > /dev/null') os.system('docker run -i -d --net=none --name=vport_test ubuntu /bin/bash > /dev/null')

Alice lives outside the container, wanting to talk to Bob in the container

v_bob = VPort(ifname='eth_bob', docker='vport_test', ip_addrs=['10.255.99.2/24']) v_alice = VPort(ifname='eth_alice', ip_addrs=['10.255.99.1/24'])

PortInc(port=v_alice) -> q::Queue() -> PortOut(port=v_bob) PortInc(port=v_bob) -> PortOut(port=v_alice) q.attach_task(parent='rl')

bess.resume_all()

os.system('sudo ping -W 1.0 -c {} -i 0 10.255.99.2'.format(2**20))

bess.pause_all()

os.system('docker kill vport_test > /dev/null') os.system('docker rm vport_test > /dev/null')

bess.reset_all()

localhost:10514 $ show tc <worker 1> +-- !default_rr_1 round_robin +-- rl rate_limit 1.000 Mbps | +-- !leaf_q:0 leaf +-- !leaf_port_inc0:0 leaf +-- !leaf_port_inc1:0 leaf localhost:10514 $ monitor tc Monitoring traffic classes: rl, !leaf_port_inc0:0, !leaf_port_inc1:0, !default_rr_1, !leaf_q:0

13:08:08.409548 CPU MHz scheduled Mpps Mbps pkts/sched cycles/p

W1 rl 1.386 1024 0.001 0.999 1.000 1353.345 W1 !leaf_port_inc0:0 1048.036 11798448 0.001 0.999 0.000 1023474.589 W1 !leaf_port_inc1:0 1050.609 11798449 0.001 0.999 0.000 1025993.384 W1 !default_rr_1 2100.032 23597928 0.003 2.998 0.000 683612.376 W1 !leaf_q:0 1.386 1023 0.001 0.999 1.000 1353.396

13:08:09.411741 CPU MHz scheduled Mpps Mbps pkts/sched cycles/p

W1 rl 1.365 1023 0.001 0.999 1.000 1333.407 W1 !leaf_port_inc0:0 1048.203 11797955 0.001 0.999 0.000 1023879.322 W1 !leaf_port_inc1:0 1050.463 11797934 0.001 1.000 0.000 1025101.418 W1 !default_rr_1 2100.031 23596912 0.003 2.998 0.000 683776.300 W1 !leaf_q:0 1.365 1023 0.001 0.999 1.000 1333.353

localhost:10514 $ monitor pipeline +--------------+ +--------------+ +--------------+ | port_inc0 | | q | | port_out0 | | PortInc | :0 1023 0: | Queue | :0 1024 0: | PortOut | | vport1/VPort | ------------> | 1/1024 | ------------> | vport0/VPort | +--------------+ +--------------+ +--------------+ +--------------+ +--------------+ | port_inc1 | | port_out1 | | PortInc | :0 1024 0: | PortOut | | vport0/VPort | ------------> | vport1/VPort | +--------------+ +--------------+

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NetSys/bess/issues/737#issuecomment-353458921, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE6FqtYeTW-LfPFpD7EJbuTf8eXfCTMks5tCsnBgaJpZM4QxS79 .

justinemarie commented 6 years ago

Happy New Year! Any updates?

justinemarie commented 6 years ago

@rware and I ran some tests today trying to trace down the origin of this. One thing we did was plot the queue occupancy in VPort directly (drv_to_sn). We found that it's being kept mostly empty. This suggests that the scheduler is not the problem, and that the scheduler is sufficiently keeping up with what arrives in the queue. Is something strange happening on the kernel side?

sangjinhan commented 6 years ago

Remind that, with TCP, the sender rate can be capped by the "ACK clocking" effect. Let's say, if the link A->B is limited by 1Mbps, the returning ACK packets (B->A) are also spread out accordingly. Given that the TCP sender transmits a new packet when an ACK packet is received in the steady state, the sender will never overload the queue.

This is why you cannot use TCP to see the queue overflowing. ping cannot be used either, since it has almost the same behavior as TCP (ping flood sends a new ICMP request only after reception of the previous ICMP response, if request rate is higher than 100pps).

By perf+UDP, do you mean UDP_RR or UDP_STREAM? If you use UDP_RR, since it is essentially identical to ping -f, you will never see queue overflow. With UDP one-way transmission, you will be able to see the queue overflowing. Check out this example:

import os

bess.add_worker(wid=1, core=1)
bess.add_tc('rl', policy='rate_limit', wid=1, resource='bit', limit={'bit': int(1e7)})

os.system('docker pull ubuntu > /dev/null')
os.system('docker run -i -d -v /usr/bin:/usr/bin:ro --net=none --name=vport_test ubuntu iperf -s')

# Alice lives outside the container, wanting to talk to Bob in the container
v_bob = VPort(ifname='eth_bob', docker='vport_test', ip_addrs=['10.255.99.2/24'])
v_alice = VPort(ifname='eth_alice', ip_addrs=['10.255.99.1/24'])

PortInc(port=v_alice) -> q::Queue() -> PortOut(port=v_bob)
PortInc(port=v_bob) -> PortOut(port=v_alice)
q.attach_task(parent='rl')

bess.resume_all()

os.system('iperf -c 10.255.99.2 -u -b 15M -i 1 -t 1000')

bess.pause_all()

os.system('docker kill vport_test > /dev/null')
os.system('docker rm vport_test > /dev/null')

bess.reset_all()

Then I see the queue is full

$ bessctl moni pip
+--------------+               +--------------+              +--------------+
|  port_inc0   |               |      q       |              |  port_out0   |
|   PortInc    |  :0 1275 0:   |    Queue     |  :0 797 0:   |   PortOut    |
| vport1/VPort | ------------> |  1023/1024   | -----------> | vport0/VPort |
+--------------+               +--------------+              +--------------+
+--------------+               +--------------+
|  port_inc1   |               |  port_out1   |
|   PortInc    |  :0 0 0:      |   PortOut    |
| vport0/VPort | ------------> | vport1/VPort |
+--------------+               +--------------+
shinae-woo commented 6 years ago

@justinemarie Can I close the issues? or I can follow up :-)

justinemarie commented 6 years ago

The issue still exists with vport. We moved to using only pmdports to work around the issue since we couldn't figure out what was wrong after several weeks debugging.

Sent from my rotary phone.

On Tue, Apr 10, 2018, 17:56 Shinae notifications@github.com wrote:

@justinemarie https://github.com/justinemarie Can I close the issues? or I can follow up :-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NetSys/bess/issues/737#issuecomment-380292634, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE6Fr4Tlt5S2MoqI37DGWY9-54PptAmks5tnVTDgaJpZM4QxS79 .