Xilinx / xup_vitis_network_example

VNx: Vitis Network Examples
Other
137 stars 43 forks source link

mm2s works only once, and limited #76

Closed byeongkeonLee closed 2 years ago

byeongkeonLee commented 2 years ago

Run Time Issues

  1. OS version lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.6 LTS Release: 18.04 Codename: bionic

  2. XRT version xbutil version Version : 2.12.427

  3. pynq version pynq version PYNQ version 2.7.0

  4. JupyterLab and Dask version if applicable notebook : 6.4.10 jupyterlab : not installed


Thanks in advance.

I would like to ask two questions for vnx-basic-test:

1. For big size buffer, mm2s send all buffer but the receiver (Intel E810) receives partial.

This is tx.

#s2mm = ol.krnl_s2mm_0

size = 1408 * 1000
shape = (size,1)

mm2s_buf = pynq.allocate(shape, dtype=np.uint8, target=ol.bank1)
#s2mm_buf = pynq.allocate(shape, dtype=np.uint8, target=ol.bank1)

mm2s_buf[:] = np.random.randint(low=0, high=((2**8)-1), size=shape, dtype=np.uint8)

mm2s_buf.sync_to_device()
mm2s_wh = mm2s.start(mm2s_buf,size, 0)

This is rx.

import threading
import socket
import numpy as np

SW_PORT = 50446
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # UDP
sock.bind(('', SW_PORT))
size = 1408*1000

print('sw_port', SW_PORT)

BYTES_PER_PACKET = 1408
# thread function 
def socket_receive_threaded(sock, size):
    shape_global = (size,1)
    shape_local = (BYTES_PER_PACKET,1)
    print('thread start', BYTES_PER_PACKET)
    recv_data_global = np.empty(shape_global, dtype = np.uint8)
    data_partial = np.empty(shape_local, dtype = np.uint8)
    num_it = (size // BYTES_PER_PACKET)
    print(size, num_it)
    sum_bytes = 0
    connection = 'None'
    for m in range(num_it):
        print(m, SW_PORT)
        res = sock.recvfrom_into(data_partial)
        print('recv', m, '/', num_it, res)
        recv_data_global[(m * BYTES_PER_PACKET) : ((m * BYTES_PER_PACKET) \
                        + BYTES_PER_PACKET)] = data_partial
        sum_bytes = sum_bytes + int(res[0])
        connection = res[1]

socket_receive_threaded(sock, size)

The result is following:

recv 248 / 1000 (1408, ('192.168.1.149', 60133))
249 50446
recv 249 / 1000 (1408, ('192.168.1.149', 60133))
250 50446
recv 250 / 1000 (1408, ('192.168.1.149', 60133))
251 50446
recv 251 / 1000 (1408, ('192.168.1.149', 60133))
252 50446
recv 252 / 1000 (1408, ('192.168.1.149', 60133))
253 50446
recv 253 / 1000 (1408, ('192.168.1.149', 60133))
254 50446
recv 254 / 1000 (1408, ('192.168.1.149', 60133))
255 50446
recv 255 / 1000 (1408, ('192.168.1.149', 60133))
256 50446

and hanging.

Although the sender sent 1000 packets, the receiver received 255 packets (this value changes every running) and hanging. I don't know what happens in here.

2. mm2s.start can not be reused. Once mm2s.start is done, I want to send another pynq buffer again. However, calling mm2s.start() again does not work. How can I send another buffer without reloading xclbin?

Thanks in advance!

mariodruiz commented 2 years ago

Hi @byeongkeonLee,

I would suggest you install pynq 2.8.0.dev0, pip install pynq==2.8.0.dev0

For your first issue, it is likely that the server can't keep up with the sending rate. Try with smaller size

mm2s can be reused. How do you know that the kernel is done? Did you try checking the status?

What board are you using?

Mario

byeongkeonLee commented 2 years ago

Really appreciate your quick reply!!

For the second issue, it is resolved after updating pynq!! Thank you!

For the first issue, I still wonder where the bottleneck is. Further, I want to adapt it to network-intensive applications. Could you recommend another solution rather than decreasing the data size?

mariodruiz commented 2 years ago

The problem is not in the FPGA, there are extensive embedded probes that you can use to verify packets performance.

You need to profile and verify what's going on on the server side. Unfortunately, this is something I cannot help with.

byeongkeonLee commented 2 years ago

Thank you!

Actually, I called several times mm2s.start with a small buffer, it seems that the throughput is acceptable although I haven't experimented yet. Again, thank you!