Xilinx / xup_vitis_network_example

VNx: Vitis Network Examples
Other
137 stars 43 forks source link

Unstable performance of basic example #4

Closed xuwenquan0 closed 4 years ago

xuwenquan0 commented 4 years ago

Sorry to bother again, I have run basic example on U280 and found a problem, the results of same code were different. In detail, I've set two UDP sockets on U280, and executed "Move data from the HOST through the network (NIC) to the Alveo card", but not always success. We grabbed packets from Host, and tracked the data flow in switch, and the packets were successfully sent to U280 interface, but sometimes "s2mm" was still waiting. So I wanna know if anyone encounters the same problem, or the problem is indeed derived from the hardware instability? (Sometimes the reboot can lead to different results, so does the longtime running). And this is my code: vnx-basic.txt

mariodruiz commented 4 years ago

Hi @xuwenquan0,

Your code looks fine, except this line sock.bind(('192.168.0.1', SW_PORT)). Is 192.168.0.1 the NIC IP address?

The transport protocol is UDP, so, if there is packet loss s2mm will be waiting until it receives all the data specified. The network layer includes counters for the different stages, you could verify if all the packets are getting to the application, check these registers https://github.com/Xilinx/xup_vitis_network_example/blob/1a7dad112b81948050bee3576a2df29dc7287d99/NetLayers/kernel.xml#L54

Mario

xuwenquan0 commented 4 years ago

Hi @mariodruiz , Thanks for replying. I have another two questions. First, when ### s2mm waits for packets, it seems the whole ### vnc-basic module is blocked, and U280 has no response even for Ping command. Second, sometimes when running ### vnc-basic example, it wrongly returns true for both emac_0 and cmac_1 even only the physical NIC of cmac_0 is connected with the switch. In such case, ### initSocketTable and the following codes will report error, and only reboot can fix this issue.

mariodruiz commented 4 years ago

@xuwenquan0 I suppose ### means VNx?

  1. The ICMP module is independent of the UDP one, ping should work. I will verify it during the week.
  2. Have not seen that issue. Is CMAC_1 connected at all? Have you tried xbutil reset instead of rebooting the server

When you observe 2. can you provide dmeg and xbutil query full logs?

xuwenquan0 commented 4 years ago

@mariodruiz

  1. I grabbed the packet from Host during s2mm's waiting, and it seems the ARP doesn't work and Host cannot find Alveo card as it keeps broadcasting ARP packets.
  2. In this case, CMAC_0 connects switch by 100 G fiber, while CMAC_1 doesn't connect anything. I always use xbutil reset when the program is blocked due to s2mm's waiting, but it doesn't work in this issue. Next time I will save the logs you mentioned above if the issue happen again.
  3. I have specified the received data size to 1480*10 to avoid UDP packet loss, but the problem still remains. I will further check the "udp_app_out_packets" register.

Thanks a lot.

mariodruiz commented 4 years ago

@xuwenquan0,

ETH Zurich has just released a TCP/IP stack for Vitis, you may want to check it out

https://github.com/fpgasystems/Vitis_with_100Gbps_TCP-IP

xuwenquan0 commented 4 years ago

Hi @mariodruiz I have replaced the s2mm, mm2s modules with our modified b_queue.cpp as follows: b_queue.txt, and I also modified Makefile, and config_files/connectivity_basic_if3.ini correspondingly (remove 's2mm', 'mm2s' and add 'b_queue').

However, an error was encountered during compiling, i.e., _'ERROR: [CFGEN 83-2285] --sc tag applied on b_queue0.k2n which is of the incorrect interface type, expecting axis master'. But if I uncomment line 11, 14, 18, 19 of b_queue.cpp, i.e., b_queue_new.txt, then the compilation will be successful and the generated program *.xclbin can run well on Alveo card.

Could you provide us any suggestion or idea about how to fix it? Thanks!

mariodruiz commented 4 years ago

Do you want to do a queued loopback with the network? Can you share your connectivity_basic_if3.ini and maybe an image of your connections?

I also suggest to make the b_queue free running kernel since you do not need interaction with the host as far I can see.

xuwenquan0 commented 4 years ago

Here is my connectivity_basic_if3.ini: connectivity_basic_if3.txt

xuwenquan0 commented 4 years ago

And sorry for my unfamiliarity with these operations. Could you tell me how to make b_queue free running kernel. Should I modify the setting by PYNQ overlay?

mariodruiz commented 4 years ago

It is a hardware configuration no a software one: https://www.xilinx.com/html_docs/xilinx2020_1/vitis_doc/streamingconnections.html#ariaid-title8

You need to replace #pragma HLS INTERFACE s_axilite port = return bundle = control with #pragma HLS interface ap_ctrl_none port=return

The config file looks fine

xuwenquan0 commented 4 years ago

I've replaced #pragma HLS INTERFACE s_axilite port = return bundle = control with #pragma HLS interface ap_ctrl_none port=return, and make distclean in both _xup_vitis_networkexample folder and _Basickernels folder.

However, the same error still remains when I execute make all DEVICE=xilinx_u280_xdma_201920_3 INTERFACE=3 DESIGN=basic under _xup_vitis_networkexample folder, while the make all under _Basickernels folder can be successful. _'ERROR: [CFGEN 83-2285] --sc tag applied on b_queue0.k2n which is of the incorrect interface type, expecting axis master'.

mariodruiz commented 4 years ago

Can you do kernelinfo of your b_queue.xo and report result?

xuwenquan0 commented 4 years ago

=== Kernel Definition === name: b_queue language: c vlnv: xilinx.com:hls:b_queue:1.0 preferredWorkGroupSizeMultiple: 0 workGroupSize: 1 debug: false containsDebugDir: 1 sourceFile: b_queue/cpu_sources/b_queue.cpp === Arg === name: n2k addressQualifier: 4 id: 0 port: N2K size: 0x8 offset: 0x0 hostOffset: 0x0 hostSize: 0x8 type: stream<ap_axiu<512, 1, 1, 16>, 0>& memSize: 0x40 origName: n2k origUse: variable === Arg === name: k2n addressQualifier: 4 id: 1 port: K2N size: 0x8 offset: 0x8 hostOffset: 0x0 hostSize: 0x8 type: stream<ap_axiu<512, 1, 1, 16>, 0>& memSize: 0x40 origName: k2n origUse: variable === Port === name: N2K mode: read_only dataWidth: 512 portType: stream === Port === name: K2N mode: write_only dataWidth: 512 portType: stream

xuwenquan0 commented 4 years ago

@mariodruiz Hello, is there any problem with this report result?

mariodruiz commented 4 years ago

No, the kernel looks fine. I need to try to regenerate the issue. I need time to do so

xuwenquan0 commented 4 years ago

Really appreciate! Thanks.

mariodruiz commented 4 years ago

@xuwenquan0, your code was missing a semicolon ; (unsigned int size) and the buffer did not have enough space.

This code should do it, pragmas to specify interface types and pipeline are not necessary when compiling to xo. You always can use vitis_hls to import the code and synthesize to check for errors. If you do so, make sure you select Vitis Kernel Flow

#include "ap_axi_sdata.h"
#include "ap_int.h"
#include "hls_stream.h"

#define DWIDTH 512
#define TDWIDTH 16

typedef ap_axiu<DWIDTH, 1, 1, TDWIDTH> pkt;

extern "C" {
void b_queue(hls::stream<pkt> &n2k, 
             hls::stream<pkt> &k2n ){
#pragma HLS interface ap_ctrl_none port=return

unsigned int size = 1408*20;
unsigned int bytes_per_beat = (DWIDTH / 8);

data_mover:
    pkt v_in;
    pkt v_out;
    ap_uint<DWIDTH> buffer[440];

    for (unsigned int i = 0; i < (size / bytes_per_beat); i++) {
        n2k.read(v_in);
        buffer[i] = v_in.data;
    }

    for (unsigned int i = 0; i < (size / bytes_per_beat); i++) {
        v_out.data = buffer[i];
        v_out.keep = -1;
        if ( (((size / bytes_per_beat) - 1)==i) || ((((i + 1) * DWIDTH/8) % 1408) == 0))
        v_out.last = 1;
        else 
        v_out.last = 0;
        v_out.dest = 1;
        k2n.write(v_out);
  }
}
}

Mario

xuwenquan0 commented 4 years ago

It works fine, thanks a lot!

xuwenquan0 commented 4 years ago

Another problem occurred when I ran the queued loopback example (my modified code). The host sent packets to U280 Alveo card and had packets back. However, I parsed the packet and found each returned packet has 64 bytes left shift, e.g., the sent packet is a 1408-string of 'a'+1406'c' + 'b' (the number is the length), and the returned packet is always a string of 1343'c'+'b'+'a'+63'c'. I also tried to fill the packet with int number by struct.pack() and parse it bystruct.unpack(), and the result was the same, i.e., with 64 bytes left shift. I ran experiments many times, and the 64-bytes shift is stable.

mariodruiz commented 4 years ago

I am sorry. I cannot help you debugging your application.

xuwenquan0 commented 4 years ago

Sorry to bother. I was intended to wonna know if you encounter the similar problem in basic basic example code, as I noticed the vnc-basic example send a random packet to Alveo card and deliver it to Host by Pcie, and np.array_equal checks wether the packet is changed during transferring. Does it means the original basic-example do not causes this problem, then I should check my own code of b_queue.cpp ? Or the top-down network stack processing may change the packet? Thank you anyway!

xuwenquan0 commented 4 years ago

I have found the reason. I seems the new xclbin flash won't reset the buffer on board, which makes the bytes shift. Only executing sudo /opt/xilinx/xrt/bin/xbutil reset can entirely reset the board, I will notice it, Thx. 👍