allegro0132 / Openwrt-mlnx-ofed

25 stars 3 forks source link

关于iommu_groups #2

Open MaurUppi opened 2 years ago

MaurUppi commented 2 years ago

这个貌似是你写的? https://zhuanlan.zhihu.com/p/356437308

再综合如下两篇材料,

https://www.thomas-krenn.com/en/wiki/Enable_Proxmox_PCIe_Passthrough http://khmel.org/proxmox-debian-10-kvm-enabling-sr-iov-for-mellanox-infiniband-cards.html

我发现一个不知道怎么解的问题,就是iommu_groups都是同一个,这应该是不对的。您知道问题出在那儿吗? 目前环境 PVE 7.02,网卡和你的是一样的,firmware version: 14.31.1014 主板:Asrock C246 WSI CPU: i5-8400T (edit: 查了一圈,貌似最大的问题出现在这些消费类的CPU并不支持ACS,也就是不能够将一个物理的网卡的各个VF独立到不同的IOMMU Group, 不知道你的CPU是什么呢?) PCIE7 是从CPU出来的

image

IOMMU Group 1 01:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
IOMMU Group 1 01:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
IOMMU Group 1 01:00.2 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] [15b3:1016]
IOMMU Group 1 01:00.3 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] [15b3:1016]
IOMMU Group 1 01:00.4 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] [15b3:1016]
IOMMU Group 1 01:00.5 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] [15b3:1016]
root@asrock:~# lspci -s 01:00 -v
01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
        Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT
        Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 1
        Memory at a8000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at a6b00000 [disabled] [size=1M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1c0] Secondary PCI Express
        Capabilities: [230] Access Control Services
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

01:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
        Subsystem: Mellanox Technologies Stand-up ConnectX-4 Lx EN, 25GbE dual-port SFP28, PCIe3.0 x8, MCX4121A-ACAT
        Flags: bus master, fast devsel, latency 0, IRQ 17, IOMMU group 1
        Memory at aa000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at a6a00000 [disabled] [size=1M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [230] Access Control Services
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core
root@asrock:~# lsmod | grep vfio
vfio_pci               57344  0
vfio_virqfd            16384  1 vfio_pci
irqbypass              16384  2 vfio_pci,kvm
vfio_iommu_type1       36864  0
vfio                   36864  2 vfio_iommu_type1,vfio_pci
root@asrock:~# dmesg |grep -e DMAR -e IOMMU
[    0.019183] ACPI: DMAR 0x000000007A47DCB0 0000C8 (v01 INTEL  EDK2     00000002      01000013)
[    0.019269] ACPI: Reserving DMAR table memory at [mem 0x7a47dcb0-0x7a47dd77]
[    0.086967] DMAR: IOMMU enabled
[    0.221486] DMAR: Host address width 39
[    0.221488] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.221496] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[    0.221500] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.221505] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.221508] DMAR: RMRR base: 0x000000794b9000 end: 0x000000794d8fff
[    0.221510] DMAR: RMRR base: 0x0000007f000000 end: 0x0000008f7fffff
[    0.221512] DMAR: RMRR base: 0x00000079d18000 end: 0x00000079d97fff
[    0.221514] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.221517] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.221519] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.224118] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    2.087938] DMAR: No ATSR found
[    2.087940] DMAR: dmar0: Using Queued invalidation
[    2.087946] DMAR: dmar1: Using Queued invalidation
[    2.088902] DMAR: Intel(R) Virtualization Technology for Directed I/O
allegro0132 commented 2 years ago

看起来是iommu的问题,你的/etc/default/grub文件是怎么修改的呢

MaurUppi commented 2 years ago

看起来是iommu的问题,你的/etc/default/grub文件是怎么修改的呢

问题查到了,Mehlow平台, 即C246及所支持CPU(即便Xeon E)是不支持SRIOV的。