Xilinx / xup_vitis_network_example

VNx: Vitis Network Examples
Other
137 stars 43 forks source link

VNx basic: number of sockets in hardware = 0 #73

Closed trashcrash closed 2 years ago

trashcrash commented 2 years ago

Run Time Issues

The system environment is at the bottom.

Problem Description:

I have 2 alveo U280 cards on a same host, directly connected. Built the design using make all DEVICE=xilinx_u280_xdma_201920_3 INTERFACE=3 DESIGN=basic I was not using dask and here's what I was trying to run:

for i in range(len(pynq.Device.devices)):
    print("{}) {}".format(i, pynq.Device.devices[i].name))

xclbin = '../basic.intf3.xilinx_u280_xdma_201920_3/vnx_basic_if3.xclbin'
ol_w0 = pynq.Overlay(xclbin,device=pynq.Device.devices[0])
ol_w1 = pynq.Overlay(xclbin,device=pynq.Device.devices[1])

print("Link worker 0 {}; link worker 1 {}".format(ol_w0.cmac_1.link_status(),ol_w1.cmac_1.link_status()))
print(ol_w1.networklayer_1.set_ip_address('192.168.0.10', debug=True))

ol_w1.networklayer_1.sockets[7] = ('192.168.0.5', 62177, 60512, True)

So far so good, and the outputs are:

0) xilinx_u280_xdma_201920_3
1) xilinx_u280_xdma_201920_3
Link worker 0 {'cmac_link': True}; link worker 1 {'cmac_link': True}
{'HWaddr': '00:0a:35:02:9d:0a', 'inet addr': '192.168.0.10', 'gateway addr': '192.168.0.1', 'Mask': '255.255.255.0'}

Then I run ol_w1.networklayer_1.populate_socket_table() and there's error: Exception: Socket list length (16) is bigger than the number of sockets in hardware (0) I then checked vnx_utils.py and found these lines:

numSocketsHW = int(self.register_map.udp_number_sockets)

if numSocketsHW < len(self.sockets):
    raise Exception(
        "Socket list length ({}) is bigger than the \
        number of sockets in hardware ({})".format(
            len(self.sockets), numSocketsHW
        )
    )

It turns out self.register_map.udp_number_sockets is 0 at address 2576. This results in the exception above. I then printed out the values from address 2000 to 3000 before populate_socket_table() with:

for i in range(2000, 3000, 4):
    print(i, ol_w1.networklayer_1.read(ol_w1.networklayer_1.register_map.udp_number_sockets.address+i))

And the output shows only address 2048 and 2052 have nonzero values as 16 and 1. I also printed out all the addresses of the attributes (e.g., eth_in_cycles, app_out_bytes) of ol_w1.networklayer_1.register_map and didn't see 2048 and 2052 among the addresses in use. Could address 2048 by any chance be the actual "number of sockets in hardware" as it matches the socket list length (16)?


  1. OS version LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core

  2. XRT version XRT Build Version: 2.6.655 Build Version Branch: 2020.1

  3. pynq version PYNQ version 2.7.0

mariodruiz commented 2 years ago

Hi @trashcrash,

Did you generate the xclbin in the past 10 days or so? If so, can you please:

  1. Pull the latest changes
  2. Run make distclean in the root directory
  3. Build again

Alternatively, you can get a fresh copy of the repository.

There was a bug in the offset addresses of the network layer that should be solved https://github.com/Xilinx/xup_vitis_network_example/commit/64ff9f99e38dbf9854faebd0f56b06f88b59b62e and https://github.com/Xilinx/xup_vitis_network_example/commit/3cfec6846e724b21976bb3a5c87d53dec282dad7

trashcrash commented 2 years ago

Thanks for your reply, I git cloned this repo on June 7th, I'll try pulling the latest version and try it out

trashcrash commented 2 years ago

With the newest version (06/13/2022) compiled, this problem is no more. I printed out the register map print(ol_w1.networklayer_1.register_map) and now it shows udp_number_sockets = Register(value=16), where the value=16 used to be value=0.