Xilinx / open-nic-driver

AMD OpenNIC driver includes the Linux kernel driver
GNU General Public License v2.0
53 stars 40 forks source link

Installing driver on Ubuntu 20.04 #30

Open reza-alimadadi opened 2 years ago

reza-alimadadi commented 2 years ago

Although it was said that to install the driver on Ubuntu 18.04 with the Linux kernel version of 4.15.0, I want to install it on Ubuntu 20.04. (I don't want to install another Linux on my machine) Unfortunately, I cannot ping another machine. I have programmed my u280 with open-nic-shell design and this is the output of dmesg after I load onic.

[  487.260359] OpenNIC Linux Kernel Driver 0.21
[  487.260531] onic 0000:17:00.0: enabling device (0000 -> 0002)
[  487.260778] onic 0000:17:00.0 onic23s0f0 (uninitialized): Set MAC address to 00:0a:35:d2:35:5c
[  487.260780] onic 0000:17:00.0: device is a master PF
[  487.261176] onic 0000:17:00.0: Allocated 8 queue vectors
[  487.360488] onic 0000:17:00.0: Number of CMAC instances = 1
[  487.360530] onic 0000:17:00.0: Setup IRQ vector 609 with name onic23s0f0-0
[  487.360553] onic 0000:17:00.0: Setup IRQ vector 610 with name onic23s0f0-1
[  487.360574] onic 0000:17:00.0: Setup IRQ vector 611 with name onic23s0f0-2
[  487.360594] onic 0000:17:00.0: Setup IRQ vector 612 with name onic23s0f0-3
[  487.360615] onic 0000:17:00.0: Setup IRQ vector 613 with name onic23s0f0-4
[  487.360639] onic 0000:17:00.0: Setup IRQ vector 614 with name onic23s0f0-5
[  487.360658] onic 0000:17:00.0: Setup IRQ vector 615 with name onic23s0f0-6
[  487.360691] onic 0000:17:00.0: Setup IRQ vector 616 with name onic23s0f0-7
[  487.368783] onic 0000:17:00.0 ens81: renamed from onic23s0f0

Could you give me some hints on how to debug it?

cneely-amd commented 2 years ago

Hi @reza-alimadadi ,

Do you have a direct cable connection to the other machine? (RS_FEC might need to be disabled/turned off in the driver, it's a kernel module parameter.)

Have you set up static IPs and arp entries?

Maybe check the RX and TX hardware registers for the corresponding CMAC port?

Best regards, --Chris

cneely-amd commented 2 years ago

Also, a lot of users use Ubuntu 20.04 and it works fine. I've also used Ubuntu 22.04.

reza-alimadadi commented 2 years ago

Thanks for the fast response. I connected it directly to another machine but the other machine didn't detect the link. I also saw that ctrl_tx/rx_enable were zero and I manually set them to 1. But it didn't work :( I have a feeling that there is something simple that I miss, but I cannot find it.

cneely-amd commented 2 years ago

If you want to see the hardware and link status without loading the driver, to narrow it slightly with lower level debugging, then I suggest trying the following:

The following few commands without the driver should be able to let you see the link status (using pcimem or similar to read from BAR2), if you adjust them for your PCI device address:

sudo setpci -s 01:00.0 COMMAND=0x02;

#writes to enable CMAC port 0
sudo ~cneely/pcimem/pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource2 0x8014 w 0x1;
sudo ~cneely/pcimem/pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource2 0x800c w 0x1;

#Read the status CMAC RX status register twice:
sudo ~cneely/pcimem/pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource2 0x8204;
sudo ~cneely/pcimem/pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/resource2 0x8204;

The readback of status the second time should be 0x3, if the link is okay. If you see 0xC0, this usually means disconnected. The status bits are described in the CMAC product guide.

The status could help to indicate a problem with your generated open-nic-shell or cable connection.

If your card has two CMAC ports and if you built the design with two CMAC ports and two phys_func, then you should be able to test your cable by plugging into both ports on the card and checking the link status.

The base address for the CMAC port 1 is 0xC0000. So repeat the writes and reads for the corresponding addresses to see its link status. E.g. corresponding RX status is at 0xC204.

attdone commented 11 months ago

Hi @cneely-amd , I am facing similar issue, as I am not able to see the interface enp8s0f0 when I run the below commands. sudo setpci -s 08:00.0 COMMAND=0x02; sudo pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:08:00.0/resource2 0x8014 w 0x1; sudo pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:08:00.0/resource2 0x800c w 0x1;

sudo pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:08:00.0/resource2 0x8204; sudo pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:08:00.0/resource2 0x8204; ip a

If I try to use insmod onic.ko, then the interface is seen under ip a. But when I check the status 0x8204, the value shows 0xC0. Please let me know how I can debug further on this.

cneely-amd commented 11 months ago

Hi @attdone ,

The CMAC product guide describes the possible values for that register. "0xC0" is a value that typically occurs if there's no physical link detected. A possible explanation and suggestion above was to experiment with disabling RS_FEC (enabled by default) and there is a kernel module parameter for the driver that you can use to disable it.

Best regards, --Chris

attdone commented 10 months ago

Hi, Thanks for the input. I tried to disable the RS_FEC in onic driver, But the device Link is not up even the physically QSFP is connected. $sudo ifconfig enp8s0 192.168.1.1 up $ sudo ethtool enp8s0 | grep Link Link detected: no

$ sudo pcimem /sys/devices/pci0000\:00/0000\:00\:01.0/0000\:08\:00.1/resource2 0x8204; /sys/devices/pci0000:00/0000:00:01.0/0000:08:00.1/resource2 opened. Target offset is 0x8204, page size is 4096 mmap(0, 4096, 0x3, 0x1, 3, 0x8204) PCI Memory mapped to address 0x7fbce7c62100. 0x8204: 0x000000C0

If I set the register 0x8090 (Loopback) and make the device up, the Link shows up. $ sudo pcimem /sys/devices/pci0000:00/0000:00:01.0/0000:08:00.0/resource2 0x8090 w 0x1; $ sudo ethtool enp8s0 | grep Link Link detected: yes

What changes to be made? Which register to be used?