Microsemi / switchtec-kernel

A kernel module for the Microsemi PCIe switch
GNU General Public License v2.0
45 stars 31 forks source link

PM8576: DMAR fault when trying ntb link up #110

Open rhardik opened 2 years ago

rhardik commented 2 years ago

Hi, I am trying to test data transfer between two Intel CPU via PM8576 PCIe switch.

PCIe switch has 6 endpoint device connected.

CPU0 has already bounded with all above 6 endpoint ports with switch partition0 So CPU1 cannot bind with these devices so patition1 is empty.

But when I try ntb_tool to test data transfer between 2 CPU, it gives fault during linkup.

It worked with one incident when I unbind one PCIe switch endpoint device from CPU0 and bind to CPU1. And then link up not giving any fault, I checked bind/unbind multiple times so the result is same.

So I can say that If Zero endpoint devices bound to the CPU then it's giving fault when trying to do NTB link up. Or CPU should have bounded to atleast one endpoint device to make NTB work`

Getting error as below on the CPU1 (partition1) which has not bounded to any switch endpoint.

DMAR: DRHD: handling fault status reg 102 000: DMAR: [DMA Read] Request device [ed:01.1] PASID ffffffff fault addr fffd0000 [fault reason 02] Present bit in context entry is clear

Thanks, Hardik

lsgunth commented 2 years ago

The problem does not likely have anything to do with the end points. There are quirks required in the kernel to ensure NTB works correctly with the iommu to prevent errors like that.

What kernel are you running?

rhardik commented 2 years ago

I'm using 5.4.115 kernel

Switcher-kernel module : `commit dcda8e5673c8b3190cca5ee9b7899fabbd672b8d Author: Kelvin Cao kelvin.cao@microchip.com Date: Mon Mar 22 12:36:16 2021 +0000

Update version to 1.7`

Linux kernel: root@alm-64-abl-cpu:~# uname -a Linux alm-64-abl 5.4.115-rt57-alm-64-abl #1 SMP PREEMPT_RT Fri May 14 02:55:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

lsgunth commented 2 years ago

Hmm, not sure. I think the quirks for the IOMMU should be in that kernel. Does it work with a newer kernel version? Does it work if you disable the iommu?

rhardik commented 2 years ago

Yes It works after disabling IOMMU. I have not tried newer Linux kernel yet but I can see the quirks in present kernel.

Attaching quirk.c quirks.zip

lsgunth commented 2 years ago

Anything in dmesg about the quirk? Maybe it's failing to create the iommu aliases?

rhardik commented 2 years ago

Hi,

dmesg logs 0.715981] pci 0000:ed:00.1: Setting Switchtec proxy ID aliases

Attached dmesg logs for Switchtec ($ dmesg | grep Switchtec) dmesg-switchtec.txt

It's shows all partitions are invalid.

lsgunth commented 2 years ago

Hmm, sounds like the requester ID table sizes are not set in a way that the quirk can pick it up. I'm not sure if they are -1 or a value greater than the quirk supports. TCheck your config and try to ensure the tables are no greater than 512 in size and are enabled.