Second, after following all the steps described in the readme (except the previous one) I got the following warning when executed the cmake .. command:
CMake Warning (dev) in CMakeLists.txt:
Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC,
empty CUDA_ARCHITECTURES not allowed. Run "cmake --help-policy CMP0104"
for policy details. Use the cmake_policy command to set the policy and
suppress this warning.
CUDA_ARCHITECTURES is empty for target "l2fwdnv".
This warning is for project developers. Use -Wno-dev to suppress it.
Guessing that it did not affect the compilation, I continued with the readme commands, and after ran the make -j$(nproc --all) command I got the following error:
[ 33%] Building CUDA object CMakeFiles/l2fwdnv.dir/src/kernel.cu.o
/home/user/Documentos/l2fwd-nv/external/dpdk/x86_64-native-linuxapp-gcc/install/include/rte_common.h(879): warning #1217-D: unrecognized format function type "gnu_printf" ignored
/home/user/Documentos/l2fwd-nv/external/dpdk/x86_64-native-linuxapp-gcc/install/include/rte_log.h(291): warning #1217-D: unrecognized format function type "gnu_printf" ignored
/home/user/Documentos/l2fwd-nv/external/dpdk/x86_64-native-linuxapp-gcc/install/include/rte_log.h(320): warning #1217-D: unrecognized format function type "gnu_printf" ignored
/home/user/Documentos/l2fwd-nv/external/dpdk/x86_64-native-linuxapp-gcc/install/include/rte_debug.h(69): warning #1217-D: unrecognized format function type "gnu_printf" ignored
/usr/lib/gcc/x86_64-linux-gnu/11/include/serializeintrin.h(41): error: identifier "__builtin_ia32_serialize" is undefined
1 error detected in the compilation of "/home/user/Documentos/l2fwd-nv/src/kernel.cu".
make[2]: *** [CMakeFiles/l2fwdnv.dir/build.make:76: CMakeFiles/l2fwdnv.dir/src/kernel.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:139: CMakeFiles/l2fwdnv.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
Guessing that I could comment that source code line, I modified file /usr/lib/gcc/x86_64-linux-gnu/11/include/serializeintrin.h, line 41 and commented the call to the __builtin_ia32_serialize function and despite all the previous error I compiled again and it "worked".
Third, when I ran the ./l2fwdnv -h I got an output entirely different that one shown in the readme of the github (In fact, the arguments provided in this github does not work):
************ L2FWD-NV ************
EAL: Detected CPU lcores: 24
EAL: Detected NUMA nodes: 1
Usage: ./l2fwdnv [options]
EAL common options:
-c COREMASK Hexadecimal bitmask of cores to run on
-l CORELIST List of cores to run on
The argument format is <c1>[-c2][,c3[-c4],...]
where c1, c2, etc are core indexes between 0 and 128
--lcores COREMAP Map lcore set to physical cpu set
The argument format is
'<lcores[@cpus]>[<,lcores[@cpus]>...]'
lcores and cpus list are grouped by '(' and ')'
Within the group, '-' is used for range separator,
',' is used for single number separator.
'( )' can be omitted for single element group,
'@' can be omitted if cpus and lcores have the same value
-s SERVICE COREMASK Hexadecimal bitmask of cores to be used as service cores
--main-lcore ID Core ID that is used as main
--mbuf-pool-ops-name Pool ops name for mbuf to use
-n CHANNELS Number of memory channels
-m MB Memory to allocate (see also --socket-mem)
-r RANKS Force number of memory ranks (don't detect)
-b, --block Add a device to the blocked list.
Prevent EAL from using this device. The argument
format for PCI devices is <domain:bus:devid.func>.
-a, --allow Add a device to the allow list.
Only use the specified devices. The argument format
for PCI devices is <[domain:]bus:devid.func>.
This option can be present several times.
[NOTE: allow cannot be used with block option]
--vdev Add a virtual device.
The argument format is <driver><id>[,key=val,...]
(ex: --vdev=net_pcap0,iface=eth2).
--iova-mode Set IOVA mode. 'pa' for IOVA_PA
'va' for IOVA_VA
-d LIB.so|DIR Add a driver or driver directory
(can be used multiple times)
--vmware-tsc-map Use VMware TSC map instead of native RDTSC
--proc-type Type of this process (primary|secondary|auto)
--syslog Set syslog facility
--log-level=<level> Set global log level
--log-level=<type-match>:<level>
Set specific log level
--log-level=help Show log types and levels
--trace=<regex-match>
Enable trace based on regular expression trace name.
By default, the trace is disabled.
User must specify this option to enable trace.
--trace-dir=<directory path>
Specify trace directory for trace output.
By default, trace output will created at
$HOME directory and parameter must be
specified once only.
--trace-bufsz=<int>
Specify maximum size of allocated memory
for trace output for each thread. Valid
unit can be either 'B|K|M' for 'Bytes',
'KBytes' and 'MBytes' respectively.
Default is 1MB and parameter must be
specified once only.
--trace-mode=<o[verwrite] | d[iscard]>
Specify the mode of update of trace
output file. Either update on a file can
be wrapped or discarded when file size
reaches its maximum limit.
Default mode is 'overwrite' and parameter
must be specified once only.
-v Display version information on startup
-h, --help This help
--in-memory Operate entirely in memory. This will
disable secondary process support
--base-virtaddr Base virtual address
--telemetry Enable telemetry support (on by default)
--no-telemetry Disable telemetry support
--force-max-simd-bitwidth Force the max SIMD bitwidth
EAL options for DEBUG use only:
--huge-unlink[=existing|always|never]
When to unlink files in hugetlbfs
('existing' by default, no value means 'always')
--no-huge Use malloc instead of hugetlbfs
--no-pci Disable PCI
--no-hpet Disable HPET
--no-shconf No shared config (mmap'd files)
EAL Linux options:
--socket-mem Memory to allocate on sockets (comma separated values)
--socket-limit Limit memory allocation on sockets (comma separated values)
--huge-dir Directory where hugetlbfs is mounted
--file-prefix Prefix for hugepage filenames
--create-uio-dev Create /dev/uioX (usually done by hotplug)
--vfio-intr Interrupt mode for VFIO (legacy|msi|msix)
--vfio-vf-token VF token (UUID) shared between SR-IOV PF and VFs
--legacy-mem Legacy memory mode (no dynamic allocation, contiguous segments)
--single-file-segments Put all hugepage memory in single files
--match-allocations Free hugepages exactly as allocated
When the output should be:
./build/l2fwdnv [EAL options] -- b|c|d|e|g|m|s|t|w|B|E|N|P|W
-b BURST SIZE: how many pkts x burst to RX
-d DATA ROOM SIZE: mbuf payload size
-g GPU DEVICE: GPU device ID
-m MEMP TYPE: allocate mbufs payloads in 0: host pinned memory, 1: GPU device memory
-n CUDA PROFILER: Enable CUDA profiler with NVTX for nvvp
-p PIPELINES: how many pipelines (each with 1 RX and 1 TX cores) to use
-s BUFFER SPLIT: enable buffer split, 64B CPU, remaining bytes GPU
-t PACKET TIME: force workload time (nanoseconds) per packet
-v PERFORMANCE PKTS: packets to be received before closing the application. If 0, l2fwd-nv keeps running until the CTRL+C
-w WORKLOAD TYPE: who is in charge to swap the MAC address, 0: No swap, 1: CPU, 2: GPU with one dedicated CUDA kernel for each burst of received packets, 3: GPU with a persistent CUDA kernel, 4: GPU with CUDA Graphs
-z WARMUP PKTS: wait this amount of packets before starting to measure performance
I checked the utils.cpp file inside the src folder and I can see the correct get_opt options
void l2fwdnv_usage(const char *prgname)
{
printf("\n\n%s [EAL options] -- b|c|d|e|g|m|s|t|w|B|E|N|P|W\n"
" -b BURST SIZE: how many pkts x burst to RX\n"
" -d DATA ROOM SIZE: mbuf payload size\n"
" -g GPU DEVICE: GPU device ID\n"
" -m MEMP TYPE: allocate mbufs payloads in 0: host pinned memory, 1: GPU device memory\n"
" -n CUDA PROFILER: Enable CUDA profiler with NVTX for nvvp\n"
" -p PIPELINES: how many pipelines (each with 1 RX and 1 TX cores) to use\n"
" -s BUFFER SPLIT: enable buffer split, 64B CPU, remaining bytes GPU\n"
" -t PACKET TIME: force exec time (nanoseconds) per packet\n"
" -v PERFORMANCE PKTS: packets to be received before closing the application. If 0, l2fwd-nv keeps running until the CTRL+C\n"
" -w WORKLOAD TYPE: who is in charge to swap the MAC address, 0: No swap, 1: CPU, 2: GPU with one dedicated CUDA kernel for each burst of received packets,>
" -z WARMUP PKTS: wait this amount of packets before starting to measure performance\n",
prgname);
}
Therefore, I am compiling the correct source code but, for some reason, the resulting binary is not behave correctly.
So, the questions are the following:
How can I download Mellanox OFED 5.4?
The warning that I got when running cmake are important and need to be solved? If so, how can I fix those warnings?
Why I am getting an error on the __builtin_ia32_serialize function and how can I solve it?
Why I am getting a totaly different help message? and why the arguments provided in this github (and in the source file) do not work?
About the last question, I guess that I am running (or compiling) another example or another example version different that the one showed in this github. However, I followed all the steps in this github, so maybe the github miss some key step or anything that need to be run to compile and run the proper example
I have multiple errores when trying to compule the l2fwd-nv example.
This is my CUDA drivers and version
The errors that i have encountred are detailed as follow:
First, the link to download
Mellanox OFED 5.4
(http://www.mellanox.com/page/products_dyn?product_family=26) is broke and it can be installed.Second, after following all the steps described in the readme (except the previous one) I got the following warning when executed the
cmake ..
command:Guessing that it did not affect the compilation, I continued with the readme commands, and after ran the
make -j$(nproc --all)
command I got the following error:Guessing that I could comment that source code line, I modified file
/usr/lib/gcc/x86_64-linux-gnu/11/include/serializeintrin.h
,line 41
and commented the call to the__builtin_ia32_serialize
function and despite all the previous error I compiled again and it "worked".Third, when I ran the
./l2fwdnv -h
I got an output entirely different that one shown in the readme of the github (In fact, the arguments provided in this github does not work):When the output should be:
I checked the
utils.cpp
file inside thesrc
folder and I can see the correct get_opt optionsTherefore, I am compiling the correct source code but, for some reason, the resulting binary is not behave correctly.
So, the questions are the following:
Mellanox OFED 5.4
?__builtin_ia32_serialize
function and how can I solve it?About the last question, I guess that I am running (or compiling) another example or another example version different that the one showed in this github. However, I followed all the steps in this github, so maybe the github miss some key step or anything that need to be run to compile and run the proper example
Thanks you