cisco-system-traffic-generator / trex-core

trex-core site
https://trex-tgn.cisco.com/
Other
1.27k stars 457 forks source link

VMware vm, rhel 7.9, i40e SRIOV failing #649

Open davehouser1 opened 3 years ago

davehouser1 commented 3 years ago

System: Vmware 6.7 rhel 7.9 virtual machine i40e nic SRIOV enabled trex version 2.87

Problem: when running Trex and trying to generate traffic, all traffic seems to error out and only "oerrors" increments on both interfaces.

Logs With verbosity set to 7 I see this in /var/log/messages

Mar  8 10:35:13 taxmd01-trex-rhel xx[3221]: EAL: Probing VFIO support...
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL:   Invalid NUMA socket, default to 0
Mar  8 10:35:14 taxmd01-trex-rhel kernel: igb_uio 0000:13:00.0: uio device registered with irq 67
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: Probe PCI driver: net_i40e_vf (8086:154c) device: 0000:13:00.0 (socket 0)
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL:   Invalid NUMA socket, default to 0
Mar  8 10:35:14 taxmd01-trex-rhel kernel: igb_uio 0000:1b:00.0: uio device registered with irq 68
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: Probe PCI driver: net_i40e_vf (8086:154c) device: 0000:1b:00.0 (socket 0)
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: EAL: No legacy callbacks, legacy socket not created
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_disable_vlan_strip(): Failed to execute command of VIRTCHNL_OP_DISABLE_VLAN_STRIPPING
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: get_avx_supported(): AVX2 is not supported in build env
Mar  8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
Mar  8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
Mar  8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_disable_vlan_strip(): Failed to execute command of VIRTCHNL_OP_DISABLE_VLAN_STRIPPING
Mar  8 10:35:15 taxmd01-trex-rhel xx[3221]: get_avx_supported(): AVX2 is not supported in build env
Mar  8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE
Mar  8 10:35:15 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE

My config:

# cat /etc/trex_cfg.yaml
### Config file generated by dpdk_setup_ports.py ###

- version: 2
  interfaces: ['13:00.0', '1b:00.0']
  port_info:
      - dest_mac: 00:50:56:bd:22:c3 # MAC OF LOOPBACK TO IT'S DUAL INTERFACE
        src_mac:  00:50:56:bd:e9:36
      - dest_mac: 00:50:56:bd:e9:36 # MAC OF LOOPBACK TO IT'S DUAL INTERFACE
        src_mac:  00:50:56:bd:22:c3

  platform:
      master_thread_id: 0
      latency_thread_id: 1
      dual_if:
        - socket: 0
          threads: [2,3,4,5,6,7]

Driver info:

[root@taxmd01-trex-rhel scripts]# lspci | grep -i Ethernet
0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
13:00.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
1b:00.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
# modinfo i40e | grep version
version:        2.14.13

Trouble shooting:

Notes:

What am I missing here?

NAYANSEN90 commented 3 years ago

Maybe try setting the vf trust on from the host

davehouser1 commented 3 years ago

@NAYANSEN90, how would one accomplish this? Note: The ESXi is using separate distributed switches for each SRIOV interface, with separate port networks for each interface built in the separate switches. I adjusted the PN to enable , pernicious mode, forged transmits, and also turned on MAC address changes. Same problem. I also found this information on how to adjust in the ESXi, but the commands do not work :( https://kb.vmware.com/s/article/74909

I took a look closer at the drivers being used by RHEL, looks like "iavf" is being used. Does this need to be changed to the i40e driver, or is this acceptable?

[root@taxmd01-trex-rhel admincybertax]# !697
ethtool -i ens224
driver: iavf
version: 3.2.3-k
firmware-version: N/A
expansion-rom-version:
bus-info: 0000:13:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@taxmd01-trex-rhel admincybertax]# ethtool -i ens256
driver: iavf
version: 3.2.3-k
firmware-version: N/A
expansion-rom-version:
bus-info: 0000:1b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

I read through some documentation on this driver, found here: https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/intel/iavf.html

Found this part interesting

"Known Issues/Troubleshooting Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700 series based device, the VF slaves may fail when they become the active slave. If the MAC address of the VF is set by the PF (Physical Function) of the device, when you add a slave, or change the active-backup slave, Linux bonding tries to sync the backup slave’s MAC address to the same MAC address as the active slave. Linux bonding will fail at this point. This issue will not occur if the VF’s MAC address is not set by the PF."

is "iavf" supported? Or do I need to change the driver for each of these nics in some way?

forgive me if I lame when it comes to some of the SRIOV terms (VF, PF, etc.) still kind of learning my way around SRIOV.

NAYANSEN90 commented 3 years ago

While I haven't worked on esxi, but iavf is supported by trex and dpdk. For the esxi commands, I won't be able to help. It could be a permission issue.

davehouser1 commented 3 years ago

@NAYANSEN90 Is there anywhere else (other logs?) I can determine what is causing the "oerrors"? Any other logs I can determine why nothing is being sent?

NAYANSEN90 commented 3 years ago

Try checking whether the mac address of the VF keeps on changing whenever the VF is unbound. Maybe you need to set a fixed MAC address to the VF from the host and use that MAC address in the trex config.

davehouser1 commented 3 years ago

MACs do not change, tried applying statically as well. Kernel / t-rex both see these MACs. Do I need to do anything to adjust /etc/modprobe.d/tuned.conf? I have "options i40e max_vfs=3,3" In there but it does not seem to do anything. Here is a screen shot of the MAC addresses in vmware and in TREX. image

davehouser1 commented 3 years ago

More trouble shooting notes:

Mar  8 14:09:22 taxmd01-trex-rhel xx[2933]: EAL: Probing VFIO support...
Mar  8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL:   Invalid NUMA socket, default to 0
Mar  8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: Probe PCI driver: net_vmxnet3 (15ad:7b0) device: 0000:13:00.0 (socket 0)
Mar  8 14:09:23 taxmd01-trex-rhel kernel: igb_uio 0000:13:00.0: uio device registered with irq 65
Mar  8 14:09:23 taxmd01-trex-rhel kernel: igb_uio 0000:1b:00.0: uio device registered with irq 66
Mar  8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL:   Invalid NUMA socket, default to 0
Mar  8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: Probe PCI driver: net_vmxnet3 (15ad:7b0) device: 0000:1b:00.0 (socket 0)
Mar  8 14:09:23 taxmd01-trex-rhel xx[2933]: EAL: No legacy callbacks, legacy socket not created

Errors I see here are Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_disable_vlan_strip(): Failed to execute command of VIRTCHNL_OP_DISABLE_VLAN_STRIPPING Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: get_avx_supported(): AVX2 is not supported in build env Mar 8 10:35:14 taxmd01-trex-rhel xx[3221]: i40evf_config_promisc(): fail to execute command CONFIG_PROMISCUOUS_MODE

Do these errors mean anything? What next steps can you recommend?

vipulagrawal-enea commented 1 year ago

@davehouser1 - Did you have any success here?