Less code in the sending loop leads to bandwidth decrease.

Hi Paul,

I should probably ask this in libmoon, however since its a general question and here there is a bit more of activity I thought it would be more beneficial.

This is just out of curiosity. I observed that by having less code in a sending loop I get 9-10 Mpps instead of 14.8. For example if we use https://github.com/libmoon/libmoon/blob/master/examples/pktgen.lua:

When the main sending loop modified the src port:

for i, buf in ipairs(bufs) do
    -- packet framework allows simple access to fields in complex protocol stacks
    local pkt = buf:getUdpPacket()
    pkt.udp:setSrcPort(SRC_PORT_BASE + math.random(0, NUM_FLOWS - 1))
end

[INFO]  1 device is up.
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 13.86 Mpps, 7096 Mbit/s (9313 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.41 Mpps, 7378 Mbit/s (9684 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.87 Mpps, 7613 Mbit/s (9992 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.85 Mpps, 7602 Mbit/s (9978 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.85 Mpps, 7602 Mbit/s (9977 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.88 Mpps, 7616 Mbit/s (9996 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.87 Mpps, 7612 Mbit/s (9990 Mbit/s with framing)

However, if I remove the line that sets the src port:

for i, buf in ipairs(bufs) do
    -- packet framework allows simple access to fields in complex protocol stacks
    local pkt = buf:getUdpPacket()
    --pkt.udp:setSrcPort(SRC_PORT_BASE + math.random(0, NUM_FLOWS - 1))
end

I get:

[INFO]  Device 5 (3C:FD:FE:C0:35:40) is up: 10000 MBit/s
[INFO]  1 device is up.
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 9.36 Mpps, 4791 Mbit/s (6288 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 10.04 Mpps, 5143 Mbit/s (6750 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 10.21 Mpps, 5225 Mbit/s (6858 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 9.34 Mpps, 4784 Mbit/s (6279 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 10.17 Mpps, 5205 Mbit/s (6831 Mbit/s with framing)

Is this expected? Do you have any explanation for this? I would like to learn what is the optimization that happens here or that "does not happen".

For the test I am using a Intel Corporation Ethernet Controller X710 for 10GbE SFP+, 1 single thread.

emmericp / MoonGen

Less code in the sending loop leads to bandwidth decrease. #311