Open rhardik opened 2 years ago
The problem does not likely have anything to do with the end points. There are quirks required in the kernel to ensure NTB works correctly with the iommu to prevent errors like that.
What kernel are you running?
I'm using 5.4.115 kernel
Switcher-kernel module : `commit dcda8e5673c8b3190cca5ee9b7899fabbd672b8d Author: Kelvin Cao kelvin.cao@microchip.com Date: Mon Mar 22 12:36:16 2021 +0000
Update version to 1.7`
Linux kernel:
root@alm-64-abl-cpu:~# uname -a
Linux alm-64-abl 5.4.115-rt57-alm-64-abl #1 SMP PREEMPT_RT Fri May 14 02:55:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Hmm, not sure. I think the quirks for the IOMMU should be in that kernel. Does it work with a newer kernel version? Does it work if you disable the iommu?
Yes It works after disabling IOMMU. I have not tried newer Linux kernel yet but I can see the quirks in present kernel.
Attaching quirk.c quirks.zip
Anything in dmesg about the quirk? Maybe it's failing to create the iommu aliases?
Hi,
dmesg logs
0.715981] pci 0000:ed:00.1: Setting Switchtec proxy ID aliases
Attached dmesg logs for Switchtec ($ dmesg | grep Switchtec)
dmesg-switchtec.txt
It's shows all partitions are invalid.
Hmm, sounds like the requester ID table sizes are not set in a way that the quirk can pick it up. I'm not sure if they are -1 or a value greater than the quirk supports. TCheck your config and try to ensure the tables are no greater than 512 in size and are enabled.
Hi, I am trying to test data transfer between two Intel CPU via PM8576 PCIe switch.
PCIe switch has 6 endpoint device connected.
CPU0 has already bounded with all above 6 endpoint ports with switch partition0 So CPU1 cannot bind with these devices so patition1 is empty.
But when I try ntb_tool to test data transfer between 2 CPU, it gives fault during linkup.
It worked with one incident when I unbind one PCIe switch endpoint device from CPU0 and bind to CPU1. And then link up not giving any fault, I checked bind/unbind multiple times so the result is same.
So I can say that If Zero endpoint devices bound to the CPU then it's giving fault when trying to do NTB link up. Or CPU should have bounded to atleast one endpoint device to make NTB work`
Getting error as below on the CPU1 (partition1) which has not bounded to any switch endpoint.
DMAR: DRHD: handling fault status reg 102 000: DMAR: [DMA Read] Request device [ed:01.1] PASID ffffffff fault addr fffd0000 [fault reason 02] Present bit in context entry is clear
Thanks, Hardik