Open marcofaltelli opened 3 years ago
Can you post your code that you use for receiveSlave
?
Ooops, sorry I forgot to paste it. There it is:
function receiveSlave(rxQueue, rxDev, size)
log:info(green("Starting up: ReceiveSlave"))
local mempool = memory.createMemPool()
local rxBufs = mempool:bufArray()
local rxCtr = stats:newDevRxCounter(rxDev, "plain")
-- this will catch a few packet but also cause out_of_buffer errors to show some stats
while mg.running() do
local rx = rxQueue:tryRecvIdle(rxBufs, 10)
rxBufs:freeAll()
rxCtr:update()
end
rxCtr:finalize()
end
That should work, not sure what is going on here, I'll need to test this on real hardware; I'll get back to this
Hi @emmericp, I think I'm having a similar problem. I use this simple, stripped down example to test multi-core performance:
local mg = require "moongen"
local memory = require "memory"
local device = require "device"
local stats = require "stats"
local PKT_SIZE = 60
function configure(parser)
parser:description("Generates traffic.")
parser:argument("dev", "Device to transmit from."):convert(tonumber)
parser:option("-c --core", "Number of cores."):default(1):convert(tonumber)
end
function master(args)
dev = device.config({port = args.dev, txQueues = args.core})
device.waitForLinks()
for i=0,args.core-1 do
mg.startTask("loadSlave", dev:getTxQueue(i))
end
local ctr = stats:newDevTxCounter(dev)
while mg.running() do
ctr:update()
mg.sleepMillisIdle(10)
end
ctr:finalize()
end
function loadSlave(queue)
local mem = memory.createMemPool(function(buf)
buf:getUdpPacket():fill({
pktLength=PKT_SIZE
})
end)
local bufs = mem:bufArray()
while mg.running() do
bufs:alloc(PKT_SIZE)
queue:send(bufs)
end
end
On an Intel Xeon Gold 5120 with 14 physical cores (HyperThreading disabled) I get the following numbers: | Cores | Mpps |
---|---|---|
1 | 21.42 | |
2 | 15.44 | |
3 | 13.75 | |
4 | 13.87 | |
5 | 13.69 | |
6 | 13.81 |
On another machine with an Intel Xeon E3-1245 4 cores + HyperThreading (8 logical cores) I get the following: | Cores | Mpps |
---|---|---|
1 | 21.40 | |
2 | 34.64 | |
3 | 33.96 | |
4 | 34.65 | |
5 | 42.62 | |
6 | 42.65 |
In this last case I'm able to saturate the link but I'm wasting a lot of cores. On both machines I'm able to saturate the link with just two cores using pktgen-dpdk (v20.11.3 on dpdk 20.08)
Hi, I'm trying to saturate a XL710 Intel NIC with 64B packets. On a single core I manage to obtain 21Mpps (which is 11Gbps). From your paper I understood that these NICs can get up to 22Gbps w/ 64B packets, so I tried to create multiple sender slaves on different cores. The results are kind of strange: I get around 13Mpps received in total, but that's also the number that every Tx queue statistics tells me, even if in my code I've created three different Tx counters, one for every Tx queue (see later).
My master and slave functions are as follows. They are taken from this test of the software-switches suite.
Do you have any best practice when scaling to multiple queues and cores for the same NIC? I also tried to use the tx-multi-core.lua test you used for your paper but those scripts are not compatible anymore. Cheers