incAndWrap is slower than a naive add-and-modulo

emmericp / MoonGen

MoonGen is a fully scriptable high-speed packet generator built on DPDK and LuaJIT. It can saturate a 10 Gbit/s connection with 64 byte packets on a single CPU core while executing user-provided Lua scripts for each packet. Multi-core support allows for even higher rates. It also features precise and accurate timestamping and rate control.

MIT License

1.04k stars 234 forks source link

I'm trying to send 10G traffic with a single core on an Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz.

This works fine when my loop is:

                for _, buf in ipairs(bufs) do
                        local pkt = buf:getUdpPacket()
                        pkt.udp.dst = dstPort
                        dstPort = (dstPort + 1) % flowCount
                end

But not when I replace the add-and-modulo with incAndWrap:

                for _, buf in ipairs(bufs) do
                        local pkt = buf:getUdpPacket()
                        pkt.udp.dst = dstPort
                        dstPort = incAndWrap(dstPort, flowCount)
                end

The latter can only do ~12.1 Mpps, whereas the former does ~14.8 Mpps.

emmericp / MoonGen

incAndWrap is slower than a naive add-and-modulo #249