emmericp / MoonGen

MoonGen is a fully scriptable high-speed packet generator built on DPDK and LuaJIT. It can saturate a 10 Gbit/s connection with 64 byte packets on a single CPU core while executing user-provided Lua scripts for each packet. Multi-core support allows for even higher rates. It also features precise and accurate timestamping and rate control.
MIT License
1.05k stars 235 forks source link

Poor Moongen Performance #228

Closed mdr78 closed 6 years ago

mdr78 commented 6 years ago

Hi all,

Inexplicably, I seem to getting very poor performance when bench marking with Moongen. I have given it lots of core but it seems to settle at around 4.5 mpps.

It shouldn't be a slow system - I am benchmarking FD.io VPP and VPP doesn't seem stretched. Am I doing anything obvious wrong below?

Ray K

vpp# show cpu Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Microarchitecture: Haswell (Haswell E) Flags: sse3 ssse3 sse41 sse42 avx avx2 aes invariant_tsc Base frequency: 2.29 GHz

$ ./moongen-simple start load-latency:0:1:rate=14Mp/s,time=10s --dpdk-conf=/root/setup/XL710/dpdk-conf.lua [INFO] Initializing DPDK. This will take a few seconds... EAL: Detected 72 lcore(s) EAL: No free hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: PCI device 0000:04:00.0 on NUMA socket 0 EAL: probe driver: 8086:1563 net_ixgbe EAL: PCI device 0000:04:00.1 on NUMA socket 0 EAL: probe driver: 8086:1563 net_ixgbe [INFO] Found 2 usable devices: Device 0: A0:36:9F:8E:A7:4C (Intel Corporation Ethernet Controller 10G X550T) Device 1: A0:36:9F:8E:A7:4D (Intel Corporation Ethernet Controller 10G X550T) [ERROR] 1 errors found while processing flow "load-latency". [WARN] Unknown option "time". [INFO] Flow load-latency => 0x1 PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 0 Mbps - half-duplex PMD: ixgbe_dev_link_status_print(): Port 1: Link Up - speed 0 Mbps - half-duplex [INFO] Waiting for devices to come up... [INFO] Device 0 (A0:36:9F:8E:A7:4C) is up: 10000 MBit/s [INFO] Device 1 (A0:36:9F:8E:A7:4D) is up: 10000 MBit/s [INFO] 2 devices are up. [Device: id=0] TX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing) [Device: id=1] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing) [Device: id=0] TX: 7.86 Mpps, 4023 Mbit/s (5281 Mbit/s with framing) [Device: id=1] RX: 7.06 Mpps, 3613 Mbit/s (4742 Mbit/s with framing) [Flow: dev=0 uid=0x1] TX: 9.47 Mpps, 4848 Mbit/s (6363 Mbit/s with framing) [Flow: dev=1 uid=0x1] RX: 8.04 Mpps, 4118 Mbit/s (5404 Mbit/s with framing) [Device: id=0] TX: 5.44 Mpps, 2787 Mbit/s (3658 Mbit/s with framing) [Device: id=1] RX: 5.07 Mpps, 2595 Mbit/s (3405 Mbit/s with framing) [Flow: dev=0 uid=0x1] TX: 4.68 Mpps, 2396 Mbit/s (3144 Mbit/s with framing) [Flow: dev=1 uid=0x1] RX: 4.66 Mpps, 2386 Mbit/s (3132 Mbit/s with framing)

$ cat ~/setup/XL710/dpdk-conf.lua -- Configuration for all DPDK command line parameters. -- See DPDK documentation at http://dpdk.org/doc/guides/testpmd_app_ug/run_app.html for details. -- libmoon tries to choose reasonable defaults, so this config file can almost always be empty. -- Be careful when running libmoon in a VM that also uses another virtio NIC, e.g., for internet access. -- In this case it may be necessary to use the blacklist or whitelist features in some configurations. DPDKConfig { -- configure the CPU cores to use, default: all cores cores = {1, 2, 3, 4, 5, 6, 7, 8},

    -- max number of shared tasks running on core 0
    --sharedCores = 1,

    -- black or whitelist devices to limit which PCI devs are used by DPDK
    -- only one of the following examples can be used
    --pciBlacklist = {"0000:81:00.3","0000:81:00.1"},
    pciWhitelist = {"0000:04:00.0","0000:04:00.1"},

    -- arbitrary DPDK command line options
    -- the following configuration allows multiple DPDK instances (use together with pciWhitelist)
    -- cf. http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html#running-multiple-independent-dpdk-applications
    cli = {
            "--file-prefix", "mg1",
            "--socket-mem", "4096,0",
    }

}

hashkash commented 6 years ago

I'm actually facing a similar problem with a Mellanox NIC. Even if I allocate more cores, only the first in the list of cores are used. When I use the load-l3-latency.lua I can hit only 7.5 Mpps, however, with the load-l2-latency.lua I can reach 14.4 Mpps on a 10g card.

emmericp commented 6 years ago

The DPDK config is just the DPDK config, i.e. the cores that are available. For the full API, there's a simple example for multi-threading: https://github.com/libmoon/libmoon/blob/master/examples/pktgen.lua

moongen-simple defaults to one thread; you can specify the same flow multiple times: each flow gets a new core.