emmericp / MoonGen

MoonGen is a fully scriptable high-speed packet generator built on DPDK and LuaJIT. It can saturate a 10 Gbit/s connection with 64 byte packets on a single CPU core while executing user-provided Lua scripts for each packet. Multi-core support allows for even higher rates. It also features precise and accurate timestamping and rate control.
MIT License
1.04k stars 235 forks source link

Less code in the sending loop leads to bandwidth decrease. #311

Open edgar-costa opened 3 years ago

edgar-costa commented 3 years ago

Hi Paul,

I should probably ask this in libmoon, however since its a general question and here there is a bit more of activity I thought it would be more beneficial.

This is just out of curiosity. I observed that by having less code in a sending loop I get 9-10 Mpps instead of 14.8. For example if we use https://github.com/libmoon/libmoon/blob/master/examples/pktgen.lua:

When the main sending loop modified the src port:

for i, buf in ipairs(bufs) do
    -- packet framework allows simple access to fields in complex protocol stacks
    local pkt = buf:getUdpPacket()
    pkt.udp:setSrcPort(SRC_PORT_BASE + math.random(0, NUM_FLOWS - 1))
end
[INFO]  1 device is up.
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 13.86 Mpps, 7096 Mbit/s (9313 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.41 Mpps, 7378 Mbit/s (9684 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.87 Mpps, 7613 Mbit/s (9992 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.85 Mpps, 7602 Mbit/s (9978 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.85 Mpps, 7602 Mbit/s (9977 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.88 Mpps, 7616 Mbit/s (9996 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 14.87 Mpps, 7612 Mbit/s (9990 Mbit/s with framing)

However, if I remove the line that sets the src port:

for i, buf in ipairs(bufs) do
    -- packet framework allows simple access to fields in complex protocol stacks
    local pkt = buf:getUdpPacket()
    --pkt.udp:setSrcPort(SRC_PORT_BASE + math.random(0, NUM_FLOWS - 1))
end

I get:

[INFO]  Device 5 (3C:FD:FE:C0:35:40) is up: 10000 MBit/s
[INFO]  1 device is up.
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 9.36 Mpps, 4791 Mbit/s (6288 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 10.04 Mpps, 5143 Mbit/s (6750 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 10.21 Mpps, 5225 Mbit/s (6858 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 9.34 Mpps, 4784 Mbit/s (6279 Mbit/s with framing)
[Device: id=5] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=5] TX: 10.17 Mpps, 5205 Mbit/s (6831 Mbit/s with framing)

Is this expected? Do you have any explanation for this? I would like to learn what is the optimization that happens here or that "does not happen".

For the test I am using a Intel Corporation Ethernet Controller X710 for 10GbE SFP+, 1 single thread.