emmericp / MoonGen

MoonGen is a fully scriptable high-speed packet generator built on DPDK and LuaJIT. It can saturate a 10 Gbit/s connection with 64 byte packets on a single CPU core while executing user-provided Lua scripts for each packet. Multi-core support allows for even higher rates. It also features precise and accurate timestamping and rate control.
MIT License
1.04k stars 234 forks source link

incAndWrap is slower than a naive add-and-modulo #249

Closed SolalPirelli closed 4 years ago

SolalPirelli commented 5 years ago

I'm trying to send 10G traffic with a single core on an Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz.

This works fine when my loop is:

                for _, buf in ipairs(bufs) do
                        local pkt = buf:getUdpPacket()
                        pkt.udp.dst = dstPort
                        dstPort = (dstPort + 1) % flowCount
                end

But not when I replace the add-and-modulo with incAndWrap:

                for _, buf in ipairs(bufs) do
                        local pkt = buf:getUdpPacket()
                        pkt.udp.dst = dstPort
                        dstPort = incAndWrap(dstPort, flowCount)
                end

The latter can only do ~12.1 Mpps, whereas the former does ~14.8 Mpps.

emmericp commented 4 years ago

the default is optimized to be reasonably fast across a larger range of values with predictables performance; using modulo or branches has some pathological bad cases that are not suitable as a default implementation.