Xilinx / Vitis-Tutorials

Vitis In-Depth Tutorials
https://Xilinx.github.io/Vitis-Tutorials/
MIT License
1.22k stars 553 forks source link

Problem with depth in Hardware kernel code of Vitis-Tutorials/Hardware_Acceleration/Design_Tutorials/01-convolution-tutorial #119

Open Ali-Flt opened 3 years ago

Ali-Flt commented 3 years ago

Hi, I've read the whole tutorial and understood all of it. But an issue I have is that the depth of output_stream in the top-level function of the filter2d_hw.cpp file is set to 64. image

Why is this? Is it 64 so that the M_AXI interface can perform data width auto-widening to 512 (64 x 8) bits? If so then why is the input stream (pixel_stream) depth not set to 64 too and only the stream for output has 64 depth?

What I assumed was that even if the depth is 2, the M_AXI interface will be able to understand that the data is flowing in the kernel without stall, therefore achieving 512 bit data width for burst transfers.

I also synthesized the code for both 2 and 64 as depth value for output_stream and both systems pass the C/RTL Co-simulation without problem. The only difference in the synthesis reports I saw was that the FIFOs used 51 more LUTs which is because of the output_stream FIFO depth of 64.

P.S.: Although the FIFO depth is 32 times more (2 x 32 = 64), the number of additional LUTs used is less than two times more (51). So I'm assuming HLS has understood that it doesn't need all of the 64 FIFOs. But even the slight addition of LUTs is strange to me because it should not need more than depth=2.

Also there is a hint comment saying you can set the stream depth to 0 for minimum resources, how can the FIFO be created with a depth of 0? should the depth not be at least 1? When I set depth to 0 suddenly there are no BRAMs used for the window_stream but for Depth of 3, 100 BRAMs are used. Why?

Thanks, Ali

allyzhou commented 3 years ago

We're looking at this issue and will get back to you soon. Thanks.

akhilayyagari commented 3 years ago

Hi @Ali-Flt - I have extracted the independent questions and answered them independently

Why is it necessary to have a FIFO depth of "64" instead of "2"

The Dataflow throughput is dynamic in nature and it is dependent on the FIFO full and empty conditions. When the user selects the FIFO of depth "2" they would see a small stall percentage because of FIFO getting full ( can be seen in the dataflow profiling information). To remove this stall, the author has increased it to"64", but the depth can go as low as "16" without any stall percentage.

How the MAXI interface can infer a 512-bit data width for burst transfers

the inferences of the 512-bit width is not dependent on the FIFO depth, it is dependent on the memory access pattern of the "dst" variable which is sequential. This infers the 512-bit width, there can be cases where the memory access pattern is random which fails the inferences

Resource utilization of the FIFO for a depth of "64" and "2"

The resource utilization of HLS is only an estimate, run the vivado to get more accurate results

FIFO of the depth of "0"

This comment will be removed, you cannot set a FIFO of depth "0", the default would be "2"

Ali-Flt commented 3 years ago

Hi @akhilayyagari , Thank you very much for the detailed answer. If I've understood correct about the FIFO depth, the right flow of configuring the depths will be to start from a high number (maximum number of data that will be pushed into each FIFO) and run C/RTL co-simulation with buffer profiling and lower the depth until stall percentage goes higher than 0.

About the setting FIFO depth to 0 comment, I was assuming that maybe by setting the depth to 0, the depth will be calculated by DATAFLOW and therefore the dataflow fifo depth will override the user input (0). I ran the synthesis with depth of 0 and the depth of the FIFO was actually set to 2, using no BRAMs.

But about you saying you can't set the FIFO depth to "0", I actually changed the config_dataflow setting "-fifo_depth" to 0 and I observed that it synthesized the design without any errors. And the fifo depth in the synthesis resource profiling report was set to 0 too.

How is this possible? Shouldn't the fifo depth be at least 1? There could be two scenarios:

  1. The system is actually synthesized with FIFO depth = 0 somehow and dataflow handled data transfer between registers without the need of buffers/fifos
  2. The system is synthesized with a depth higher than 0 and the "-fifo_depth 0" setting was ignored and the synthesis report is wrong.

Which one is the correct scenario?

Thanks again, Ali

akhilayyagari commented 3 years ago

hi @Ali-Flt ,

There are two ways you can optimize the FIFO depth,

The second issue you have reported may be a bug, it should not take the "0" which does not make sense. We will fix this in the future release. The minimum depth should be "1", the tool default is "2". if the depth of the FIFO is less than 1024, then they use registers unless explicitly specified using a pragma.