Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
553 stars 468 forks source link

Incorrect behavior of kernel events in 2021.1 on edge #6193

Open doonny opened 2 years ago

doonny commented 2 years ago

I am upgrading from 2020.1 to 2021.1, but I have found inconsistent behavior of the xrt on the zcu102 edge device.

I am testing with parallel kernels as a producer and a consumer mode to move data from DDR and backto DDR as loopback test. But the kernel event needs to wait much longer time on one kernel than the other:

图片 we can see that kernel DataStore needs to wait much longer time to finish than DataLoad.

However the kernel profile section, we can see that both kernel transfers the same amount of data with a similar speed:

图片

Moreover, on the guidance page, it was told that kernel DataStore was not USED ??? image

This is what look like in version vitis 2020.1, two kernels have the same execution time: 图片

Codes are very simple:

`

include

include "ap_axi_sdata.h"

include "ap_int.h"

include "hls_stream.h"

define VEC_SIZE 16

typedef ap_uint<VEC_SIZE*32> data_vec;

typedef ap_axiu<VEC_SIZE*32,0,0,0> k2k_data;

extern "C" { void DataLoad( const data_vec A_in, const data_vec C_in, const unsigned int data_num, hls::stream &stream_out_0, hls::stream &stream_out_1 ) {

pragma HLS INTERFACE m_axi port = A_in offset = slave bundle = gmem0 depth = 32// group-0

pragma HLS INTERFACE m_axi port = C_in offset = slave bundle = gmem1 depth = 32// group-1

pragma HLS INTERFACE axis port = stream_out_0 depth = 16

pragma HLS INTERFACE axis port = stream_out_1 depth = 16

k2k_data tmp1, tmp2;

for(unsigned int i=0; i<data_num; i++){
    tmp1.data = A_in[i];
    tmp2.data = C_in[i];
    //blocking streaming access
    stream_out_0.write(tmp1);
    stream_out_1.write(tmp2);
}

} }

include "ap_axi_sdata.h"

include "ap_int.h"

include "hls_stream.h"

define VEC_SIZE 16

typedef ap_uint<VEC_SIZE*32> data_vec;

typedef ap_axiu<VEC_SIZE*32,0,0,0> k2k_data;

extern "C" { void DataStore( data_vec B_out, //HBM[1] data_vec D_out, //HBM[3] const unsigned int data_num, hls::stream &stream_in_0, hls::stream &stream_in_1 ) {

pragma HLS INTERFACE m_axi port = B_out offset = slave bundle = gmem2 // group-0

pragma HLS INTERFACE m_axi port = D_out offset = slave bundle = gmem3 // group-1

pragma HLS INTERFACE axis port = stream_in_0 depth = 16

pragma HLS INTERFACE axis port = stream_in_1 depth = 16

k2k_data tmp;

for(unsigned int i=0; i<data_num; i++){
    tmp = stream_in_0.read();
    B_out[i] = tmp.data;
    tmp = stream_in_1.read();
    D_out[i] = tmp.data;
}

} }

`

stsoe commented 2 years ago

@jvillarre Can you help comment on this?

doonny commented 2 years ago

Any updates on this issue ?

uday610 commented 1 year ago

@doonny , unfortunately, this has fallen off our radar. But frankly speaking, 21.1 is a too old release for us to investigate anything. So if you can try latest 22.1 release and still see the problem then that would be great