Closed SolalPirelli closed 4 years ago
the default is optimized to be reasonably fast across a larger range of values with predictables performance; using modulo or branches has some pathological bad cases that are not suitable as a default implementation.
I'm trying to send 10G traffic with a single core on an
Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz
.This works fine when my loop is:
But not when I replace the add-and-modulo with
incAndWrap
:The latter can only do ~12.1 Mpps, whereas the former does ~14.8 Mpps.