Open davehouser1 opened 3 years ago
Maybe try setting the vf trust on from the host
@NAYANSEN90, how would one accomplish this? Note: The ESXi is using separate distributed switches for each SRIOV interface, with separate port networks for each interface built in the separate switches. I adjusted the PN to enable , pernicious mode, forged transmits, and also turned on MAC address changes. Same problem. I also found this information on how to adjust in the ESXi, but the commands do not work :( https://kb.vmware.com/s/article/74909
I took a look closer at the drivers being used by RHEL, looks like "iavf" is being used. Does this need to be changed to the i40e driver, or is this acceptable?
[root@taxmd01-trex-rhel admincybertax]# !697
ethtool -i ens224
driver: iavf
version: 3.2.3-k
firmware-version: N/A
expansion-rom-version:
bus-info: 0000:13:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@taxmd01-trex-rhel admincybertax]# ethtool -i ens256
driver: iavf
version: 3.2.3-k
firmware-version: N/A
expansion-rom-version:
bus-info: 0000:1b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
I read through some documentation on this driver, found here: https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/intel/iavf.html
Found this part interesting
"Known Issues/Troubleshooting Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700 series based device, the VF slaves may fail when they become the active slave. If the MAC address of the VF is set by the PF (Physical Function) of the device, when you add a slave, or change the active-backup slave, Linux bonding tries to sync the backup slave’s MAC address to the same MAC address as the active slave. Linux bonding will fail at this point. This issue will not occur if the VF’s MAC address is not set by the PF."
is "iavf" supported? Or do I need to change the driver for each of these nics in some way?
forgive me if I lame when it comes to some of the SRIOV terms (VF, PF, etc.) still kind of learning my way around SRIOV.
While I haven't worked on esxi, but iavf is supported by trex and dpdk. For the esxi commands, I won't be able to help. It could be a permission issue.
@NAYANSEN90 Is there anywhere else (other logs?) I can determine what is causing the "oerrors"? Any other logs I can determine why nothing is being sent?
Try checking whether the mac address of the VF keeps on changing whenever the VF is unbound. Maybe you need to set a fixed MAC address to the VF from the host and use that MAC address in the trex config.
MACs do not change, tried applying statically as well. Kernel / t-rex both see these MACs. Do I need to do anything to adjust /etc/modprobe.d/tuned.conf? I have "options i40e max_vfs=3,3" In there but it does not seem to do anything. Here is a screen shot of the MAC addresses in vmware and in TREX.
More trouble shooting notes:
Confirmed from a colleague the interfaces we are using have worked with SRIOV in the past on other RHEL systems, from the same ESXi.
Tried running the port groups on vSwitches instead of Distributed switches, same issue. Note all security settings are set to accept in vmware.
Tried running everything on 2.88, same problem
I converted the interfaces back to VMXnet3, and they work.
I followed these instructions as well, with no luck: (https://github.com/cisco-system-traffic-generator/trex-core/blob/master/doc/trex_appendix_linux_vf_config.asciidoc). We are using i40e's, again the driver loaded is 2.14.13, is this version supported?
I also wanted to compare the log output for SRIOV (broken), vs VMXnet3 (working): So further digging of logs this is what shows up for SRIOV, which fails
Mar 8 10:35:13 taxmd01-trex-rhel xx[3221]: EAL: Probing VFIO support...
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: Invalid NUMA socket, default to 0
Mar 8 10:35:14 taxmd01-trex-rhel kernel: igb_uio 0000:13:00.0: uio device registered with irq 67
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: Probe PCI driver: net_i40e_vf (8086:154c) device: 0000:13:00.0 (socket 0)
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: Invalid NUMA socket, default to 0
Mar 8 10:35:14 taxmd01-trex-rhel kernel: igb_uio 0000:1b:00.0: uio device registered with irq 68
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: Probe PCI driver: net_i40e_vf (8086:154c) device: 0000:1b:00.0 (socket 0)
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: No legacy callbacks, legacy socket not created
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_disable_vlan_strip(): Failed to execute command of VIRTCHNL_OP_DISABLE_VLAN_STRIPPING
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: get_avx_supported(): AVX2 is not supported in build env
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
Mar 8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
Mar 8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_disable_vlan_strip(): Failed to execute command of VIRTCHNL_OP_DISABLE_VLAN_STRIPPING
Mar 8 10:35:15 taxmd01-trex-rhel xx[3221]: get_avx_supported(): AVX2 is not supported in build env
Mar 8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
Mar 8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
This is what shows up with VMXnet3 interfaces, which works
Mar 8 14:09:22 taxmd01-trex-rhel xx[2933]: EAL: Probing VFIO support...
Mar 8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: Invalid NUMA socket, default to 0
Mar 8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: Probe PCI driver: net_vmxnet3 (15ad:7b0) device: 0000:13:00.0 (socket 0)
Mar 8 14:09:23 taxmd01-trex-rhel kernel: igb_uio 0000:13:00.0: uio device registered with irq 65
Mar 8 14:09:23 taxmd01-trex-rhel kernel: igb_uio 0000:1b:00.0: uio device registered with irq 66
Mar 8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: Invalid NUMA socket, default to 0
Mar 8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: Probe PCI driver: net_vmxnet3 (15ad:7b0) device: 0000:1b:00.0 (socket 0)
Mar 8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: No legacy callbacks, legacy socket not created
Errors I see here are
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_disable_vlan_strip(): Failed to execute command of VIRTCHNL_OP_DISABLE_VLAN_STRIPPING
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: get_avx_supported(): AVX2 is not supported in build env
Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
Do these errors mean anything? What next steps can you recommend?
@davehouser1 - Did you have any success here?
System: Vmware 6.7 rhel 7.9 virtual machine i40e nic SRIOV enabled trex version 2.87
Problem: when running Trex and trying to generate traffic, all traffic seems to error out and only "oerrors" increments on both interfaces.
Logs With verbosity set to 7 I see this in /var/log/messages
My config:
Driver info:
Trouble shooting:
Notes:
What am I missing here?