Open aneesullah opened 2 years ago
Hi @aneesullah,
Can you say more about your test setup?
The QDMA Performance report was characterized separately by a different team and not using OpenNIC, however, performance should be similar based on using the QDMA.
The performance using pktgen-dpdk should be something along the lines of the following, however, depending on machine capabilities:
- Ports 0-1 of 2 <Main Page> Copyright(c) <2010-2021>, Intel Corporation
Flags:Port : -------Range :0 -------Range :1PMD: qdma_dev_link_update(): Link update done
Link State : <UP-100000-FD> <UP-100000-FD> ---Total Rate---
Pkts/s Rx : 7,679,047 7,666,897 15,345,944
Tx : 7,743,488 7,743,488 15,486,976
MBits/s Rx/Tx : 47,762/48,153 49,338/49,827 97,101/97,981
Pkts/s Rx Max : 27,320,352 7,774,715 27,320,352
Tx Max : 36,211,872 7,833,377 36,211,872
Broadcast : 0 0
Multicast : 0 0
Sizes 64 : 892,988,087 700,471,032
65-127 : 14,667,088,513 8,375,758,013
128-255 : 48,810,275,136 22,767,085,962
256-511 : 59,827,277,291 70,602,037,739
512-1023 : 117,997,157,741 142,212,997,263
1024-1518 : 115,162,837,292 112,307,731,359
Runts/Jumbos : 0/0 0/0
ARP/ICMP Pkts : 0/0 0/0
Errors Rx/Tx : 0/0 0/0
Total Rx Pkts : 357,351,332,184 356,959,800,953
Tx Pkts : 359,974,053,503 359,438,896,255
Rx/Tx MBs :2,222,514,601/2,238,12,297,667,031/2,313,300,795
Pattern Type : abcd... abcd...
Tx Count/% Rate : Forever /100% Forever /100%
Pkt Size/Tx Burst : 64 / 32 64 / 32
TTL/Port Src/Dest : 64/ 1234/ 5678 64/ 1234/ 5678
Pkt Type:VLAN ID : IPv4 / TCP:0001 IPv4 / TCP:0001
802.1p CoS/DSCP/IPP : 0/ 0/ 0 0/ 0/ 0
VxLAN Flg/Grp/vid : 0000/ 0/ 0 0000/ 0/ 0
IP Destination : 192.168.1.1 192.168.0.1
Source : 192.168.0.1/24 192.168.1.1/24
MAC Destination : 15:16:17:18:19:1a 15:16:17:18:19:1a
Source : 15:16:17:18:19:1a 15:16:17:18:19:1a
PCI Vendor/Addr : 10ee:903f/65:00.0 10ee:913f/65:00.1
-- Pktgen 21.03.0 (DPDK 20.11.0) Powered by DPDK (pid:32576) ----------------
Hi @cneely-amd, Thanks a lot for your quick response. Following is the information: 1) Hardware From lscpu:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD Ryzen Threadripper PRO 3975WX 32-Cores Stepping: 0 Frequency boost: enabled CPU MHz: 1919.520 CPU max MHz: 3500.0000 CPU min MHz: 2200.0000 BogoMIPS: 7000.66 Virtualization: AMD-V L1d cache: 1 MiB L1i cache: 1 MiB L2 cache: 16 MiB L3 cache: 128 MiB NUMA node0 CPU(s): 0-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; LFENCE, IBPB conditional, STIBP conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good no pl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_l egacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ib pb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqmmbm local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
More details about hardware:
H/W path Device Class Description system AS -5014A-TT (091715D9) /0 bus M12SWA-TF /0/0 memory 64KiB BIOS /0/12 memory 512GiB System Memory /0/12/0 memory 64GiB DIMM DDR4 Synchron /0/12/1 memory 64GiB DIMM DDR4 Synchron /0/12/2 memory 64GiB DIMM DDR4 Synchron /0/12/3 memory 64GiB DIMM DDR4 Synchron /0/12/4 memory 64GiB DIMM DDR4 Synchron /0/12/5 memory 64GiB DIMM DDR4 Synchron /0/12/6 memory 64GiB DIMM DDR4 Synchron /0/12/7 memory 64GiB DIMM DDR4 Synchron /0/15 memory 2MiB L1 cache /0/16 memory 16MiB L2 cache /0/17 memory 128MiB L3 cache /0/18 processor AMD Ryzen Threadripper P /0/100 bridge Starship/Matisse Root Co /0/100/0.2 generic Starship/Matisse IOMMU /0/100/7.1 bridge Starship/Matisse Interna /0/100/7.1/0 generic Starship/Matisse PCIe Du /0/100/8.1 bridge Starship/Matisse Interna /0/100/8.1/0 generic Starship/Matisse Reserve /0/100/8.1/0.3 bus Starship USB 3.0 Host Co /0/100/8.1/0.3/0 usb9 bus xHCI Host Controller /0/100/8.1/0.3/1 usb10 bus xHCI Host Controller /0/100/14 bus FCH SMBus Controller /0/100/14.3 bridge FCH LPC Bridge /0/101 bridge Starship/Matisse PCIe Du /0/102 bridge Starship/Matisse PCIe Du /0/103 bridge Starship/Matisse PCIe Du /0/104 bridge Starship/Matisse PCIe Du /0/105 bridge Starship/Matisse PCIe Du /0/106 bridge Starship/Matisse PCIe Du /0/107 bridge Starship/Matisse PCIe Du /0/108 bridge Starship Device 24; Func /0/109 bridge Starship Device 24; Func /0/10a bridge Starship Device 24; Func /0/10b bridge Starship Device 24; Func /0/10c bridge Starship Device 24; Func /0/10d bridge Starship Device 24; Func /0/10e bridge Starship Device 24; Func /0/10f bridge Starship Device 24; Func /0/110 bridge Starship/Matisse Root Co /0/110/0.2 generic Starship/Matisse IOMMU /0/110/7.1 bridge Starship/Matisse Interna /0/110/7.1/0 generic Starship/Matisse PCIe Du /0/110/8.1 bridge Starship/Matisse Interna /0/110/8.1/0 generic Starship/Matisse Reserve /0/110/8.1/0.1 generic Starship/Matisse Cryptog /0/110/8.1/0.3 bus Starship USB 3.0 Host Co /0/110/8.1/0.3/0 usb7 bus xHCI Host Controller /0/110/8.1/0.3/0/2 bus USB Virtual Hub /0/110/8.1/0.3/0/2/1 input SMCI HID KM /0/110/8.1/0.3/0/2/2 enxb03af2b6059f communication RNDIS/Ethernet Gadget /0/110/8.1/0.3/1 usb8 bus xHCI Host Controller /0/110/8.1/0.4 multimedia Starship/Matisse HD Audi /0/111 bridge Starship/Matisse PCIe Du /0/112 bridge Starship/Matisse PCIe Du /0/113 bridge Starship/Matisse PCIe Du /0/114 bridge Starship/Matisse PCIe Du /0/115 bridge Starship/Matisse PCIe Du /0/116 bridge Starship/Matisse PCIe Du /0/117 bridge Starship/Matisse PCIe Du /0/118 bridge Starship/Matisse Root Co /0/118/0.2 generic Starship/Matisse IOMMU /0/118/1.1 bridge Starship/Matisse GPP Bri /0/118/3.1 bridge Starship/Matisse GPP Bri /0/118/3.1/0 enp67s0 network Ethernet interface /0/118/7.1 bridge Starship/Matisse Interna /0/118/7.1/0 generic Starship/Matisse PCIe Du /0/118/8.1 bridge Starship/Matisse Interna /0/118/8.1/0 generic Starship/Matisse Reserve /0/119 bridge Starship/Matisse PCIe Du /0/11a bridge Starship/Matisse PCIe Du /0/11b bridge Starship/Matisse PCIe Du /0/11c bridge Starship/Matisse PCIe Du /0/11d bridge Starship/Matisse PCIe Du /0/11e bridge Starship/Matisse PCIe Du /0/11f bridge Starship/Matisse PCIe Du /0/120 bridge Starship/Matisse Root Co /0/120/0.2 generic Starship/Matisse IOMMU /0/120/3.1 bridge Starship/Matisse GPP Bri /0/120/3.1/0 bridge Matisse Switch Upstream /0/120/3.1/0/1 bridge Matisse PCIe GPP Bridge /0/120/3.1/0/1/0 storage NVMe SSD Controller Cx6 /0/120/3.1/0/1/0/0 /dev/nvme0 storage KCD6XLUL1T92 /0/120/3.1/0/1/0/0/1 /dev/nvme0n1 disk 1920GB NVMe namespace /0/120/3.1/0/1/0/0/1/1 /dev/nvme0n1p1 volume 511MiB Windows FAT volum /0/120/3.1/0/1/0/0/1/2 /dev/nvme0n1p2 volume 8191MiB Linux swap volum /0/120/3.1/0/1/0/0/1/3 /dev/nvme0n1p3 volume 1779GiB EXT4 volume /0/120/3.1/0/8 bridge Matisse PCIe GPP Bridge /0/120/3.1/0/8/0 generic Starship/Matisse Reserve /0/120/3.1/0/8/0.1 bus Matisse USB 3.0 Host Con /0/120/3.1/0/8/0.1/0 usb1 bus xHCI Host Controller /0/120/3.1/0/8/0.1/0/2 generic A-U280-A32G /0/120/3.1/0/8/0.1/0/6 multimedia USB Audio /0/120/3.1/0/8/0.1/1 usb2 bus xHCI Host Controller /0/120/3.1/0/8/0.3 bus Matisse USB 3.0 Host Con /0/120/3.1/0/8/0.3/0 usb3 bus xHCI Host Controller /0/120/3.1/0/8/0.3/1 usb4 bus xHCI Host Controller /0/120/3.1/0/a bridge Matisse PCIe GPP Bridge /0/120/3.1/0/a/0 storage FCH SATA Controller [AHC /0/120/3.2 bridge Starship/Matisse GPP Bri /0/120/3.2/0 bus ASMedia Technology Inc. /0/120/3.2/0/0 usb5 bus xHCI Host Controller /0/120/3.2/0/1 usb6 bus xHCI Host Controller /0/120/3.3 bridge Starship/Matisse GPP Bri /0/120/3.3/0 enp103s0 network I210 Gigabit Network Con /0/120/3.4 bridge Starship/Matisse GPP Bri /0/120/3.4/0 bridge AST1150 PCI-to-PCI Bridg /0/120/3.4/0/0 display ASPEED Graphics Family /0/120/3.5 bridge Starship/Matisse GPP Bri /0/120/3.5/0 network Aquantia Corp. /0/120/7.1 bridge Starship/Matisse Interna /0/120/7.1/0 generic Starship/Matisse PCIe Du /0/120/8.1 bridge Starship/Matisse Interna /0/120/8.1/0 generic Starship/Matisse Reserve /0/121 bridge Starship/Matisse PCIe Du /0/122 bridge Starship/Matisse PCIe Du /0/123 bridge Starship/Matisse PCIe Du /0/124 bridge Starship/Matisse PCIe Du /0/125 bridge Starship/Matisse PCIe Du /0/126 bridge Starship/Matisse PCIe Du /0/127 bridge Starship/Matisse PCIe Du /0/1 system PnP device PNP0c02 /0/2 system PnP device PNP0c01 /0/3 system PnP device PNP0b00 /0/4 system PnP device PNP0c02 /0/5 communication PnP device PNP0501 /0/6 communication PnP device PNP0501 /0/7 system PnP device PNP0c02 /0/8 system PnP device PNP0c02 /1 power To Be Filled By O.E.M. /2 power To Be Filled By O.E.M.
from numactl --hardware: available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 node 0 size: 515641 MB node 0 free: 482304 MB node distances: node 0 0: 10
So, NUMA is not enabled in BIOS, is it required? 2) I have generated vivado project with vivado -mode tcl -source build.tcl -tclargs -board au280 -min_pkt_len 64 -max_pkt_len 9600 -num_cmac_port 2 -num_phys_func 2 -impl 1 -post_impl 1 -jobs 64
3) The server is connected to a U280 card, both the QSFPz are connected through a loopback cable and pktgen is run with the command: sudo pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen -a 43:00.0 -a 43:00.1 -d librte_net_qdma.so -l 4-10 -n 4 -a 40:03.1 -a 40:03.1 -- -m [6:7].0 -m [8:9].1
Following is the output
\kPorts 0-1 of 2
It seems from pktgen output that only 64 bytes packets are generated. How to generated larger size packets? Regards, Anees
Hi @aneesullah in your testing with pktgen-dpdk, can you try something like the following:
range 0 size 64 64 1518 3
range 1 size 1500 64 1518 5
enable 0-1 range
start 0-1
@aneesullah Also, maybe to go along with my above suggestion for how to vary the packet size in pktgen-dpdk, I wanted to mention that in my testing I've been enabling serdes loopback instead of a cable for those quick tests by writing 0x1 to 0x8090 (for port 0) and 0x1 to 0xC090 (for port 1).
Hi @cneely-amd , Thanks a lot. Improved, getting around 70 Gbps for Cable loopback but still not able to hit the link rate. Here is the PKTGEN snapshot: Also getting around 70 Gbps with SERDES loopback, For enabling SERDES Loopback used the following Any idea?
Regards, Anees
Another related question: pktgen-dpdk allows only 16 bytes of user fill pattern for testing, what if we have a large amount of data from a file or from memory to transfer? I think, it is not supported. What about dma_to_device and dma_from_device functions from the QDMA driver library, Can they be used to transmit custom user data at 100 Gbps with Open NIC? Shall we need to write our own dpdk app based on QDMA driver library which is patched for OpenNIC? Any suggestion how such functionality can be quickly achieved? Note, we do need to measure the performance while doing the TX/RX when transferring our data. Thanks a lot
@aneesullah I'm not sure what the best approach would be to improving the performance. I can give two examples of different machine configurations that I have tried recently. Both are using -n 4 with mapping cores like in the example.
~70-80Gbps (fluctuating in that range):
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: 11th Gen Intel(R) Core(TM) i7-11700F @ 2.50GHz
CPU family: 6
Model: 167
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 1
CPU max MHz: 4900.0000
CPU min MHz: 800.0000
BogoMIPS: 4992.00
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 384 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 4 MiB (8 instances)
L3: 16 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
RAM: 32 GB
~95Gbps (fairly constant):
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
Stepping: 7
CPU MHz: 800.045
CPU max MHz: 3200.0000
CPU min MHz: 800.0000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 22528K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
RAM: 192GB
(Note: I updated the info above because the first time it didn't paste correctly into my message)
I also have a Ryzen 5950 with 32GB for testing, but right now my GPU is using up most of the lanes and I need to swap the order of my PCI cards around before I can test it. I'll try to do that as an experiment when I get a chance.
Best regards, --Chris
Hi @aneesullah,
I tried my Ryzen 5950X machine and I'm getting the following:
\ Ports 0-1 of 2 <Main Page> Copyright(c) <2010-2021>, Intel Corporation
Flags:Port : -------Range :0 -------Range :1
Link State : <UP-100000-FD> <UP-100000-FD> ---Total Rate---
Pkts/s Rx : 7,965,641 7,958,737 15,924,378
Tx : 8,003,200 8,003,200 16,006,400
MBits/s Rx/Tx : 49,627/49,875 51,223/51,520 100,851/101,395
Pkts/s Rx Max : 8,104,052 8,121,805 16,225,857
Tx Max : 8,129,920 8,129,793 16,259,713
Broadcast : 0 0
Multicast : 0 0
Sizes 64 : 2,131,832 2,131,412
65-127 : 44,765,934 25,580,104
128-255 : 147,057,426 68,193,983
256-511 : 183,028,200 217,291,635
512-1023 : 362,044,422 434,493,564
1024-1518 : 351,542,078 342,874,892
Runts/Jumbos : 0/0 0/0
ARP/ICMP Pkts : 0/0 0/0
Errors Rx/Tx : 0/0 0/0
Total Rx Pkts : 1,084,465,571 1,084,468,533
Tx Pkts : 1,085,482,495 1,085,482,367
Rx/Tx MBs : 6,758,742/6,764,658 6,981,025/6,987,758
Pattern Type : abcd... abcd...
Tx Count/% Rate : Forever /100% Forever /100%
Pkt Size/Tx Burst : 64 / 32 64 / 32
TTL/Port Src/Dest : 64/ 1234/ 5678 64/ 1234/ 5678
Pkt Type:VLAN ID : IPv4 / TCP:0001 IPv4 / TCP:0001
802.1p CoS/DSCP/IPP : 0/ 0/ 0 0/ 0/ 0
VxLAN Flg/Grp/vid : 0000/ 0/ 0 0000/ 0/ 0
IP Destination : 192.168.1.1 192.168.0.1
Source : 192.168.0.1/24 192.168.1.1/24
MAC Destination : 15:16:17:18:19:1a 15:16:17:18:19:1a
Source : 15:16:17:18:19:1a 15:16:17:18:19:1a
PCI Vendor/Addr : 10ee:903f/0b:00.0 10ee:913f/0b:00.1
-- Pktgen 21.03.1 (DPDK 20.11.0) Powered by DPDK (pid:3131) -----------------
This is with (as before):
Pktgen:/> range 0 size 64 64 1518 3
Pktgen:/> range 1 size 1500 64 1518 5
Pktgen:/> enable 0-1 range
Pktgen:/> start 0-1
lscpu reports:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 5950X 16-Core Processor
CPU family: 25
Model: 33
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU max MHz: 5272.6558
CPU min MHz: 2200.0000
BogoMIPS: 6799.19
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 512 KiB (16 instances)
L1i: 512 KiB (16 instances)
L2: 8 MiB (16 instances)
L3: 64 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
RAM: 32GB
(P.S. note: my Ryzen machine might have some overclocking settings enabled due to the latest Radeon software driver issue in the news.)
Hi @cneely-amd, I checked on another machine, on this one, NUMA nodes are not enabled from the BIOS. I am able to get 100Gbps. Thanks for your help. Any idea why the enabling NUMA reduces the speed on the other machine or is it something else? Regards, Anees
Hi @aneesullah, I would guess that it might have to do with NUMA and allocation of hugepages and their locality to whichever processor cores are specified in the test, but that is just a guess. --Chris
Hi @aneesullah and @cneely-amd , I am utilizing the Alveo U200 and have conducted tests with pktgen. However, the observed packet transfer rates are at 300 MBits/s for transmission and 9000 MBits/s for reception. I'm seeking guidance on how to enhance the throughput to achieve the optimal 100 Gbps. I have configured the BIOS settings in accordance with the specifications outlined on the Open-NIC DPDK Git page.
During packet transfer, I used to get "Timeout on request to dma internal csr register", "Packet length mismatch error" and "Detected Fatal length mismatch". This hinders further transfer. Please let me know how to resolve this?
Thanking in advance.
Hi, How to reproduce the results reported in "Xilinx Answer 71453 QDMA Performance Report" with PKTGEN, only getting 10Gbps link speed on threadripper pro with U280 card. Are these results only for the QDMA example design or they apply to Open NIC also? Regards, Anees