Closed rahulj175 closed 2 years ago
Correct me if I am wrong, but I think the host<->mem part is a bit out of scope with what is provided in that notebook. You are emulating the kernel execution in Vivado, so anything that happens outside of the kernel execution scope is not included in the waveform.
I am actually new to the PYNQ & Vitis flow. I want to baseline host API (wrapped over openCL APIs) call timings with respect to the kernel execution time using PYNQ flow.
I was going through the tutorials & material available for Vitis. In one of the Xilinx documents https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug1393-vitis-application-acceleration.pdf on page-210 , I found that while doing hardware emulation with profiling on (using conventional host.cpp) there is a section in waveform window which captures data flow from host to global memory. Just wanted to confirm if it is intrinsically supported using PYNQ flow and if that is the case I might have missed some config setting.
There is one example notebook "1-efficient-accelerator-scheduling" where concept of compute & data transfer overlapping (pipelining) is explained. If we can baseline host call timings with respect to kernel execution time (as with conventional host.cpp) , we can do this overlapping more efficiently. Let me know if I am mistaken here.
I am actually new to the PYNQ & Vitis flow. I want to baseline host API (wrapped over openCL APIs) call timings with respect to the kernel execution time using PYNQ flow.
I was going through the tutorials & material available for Vitis. In one of the Xilinx documents https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug1393-vitis-application-acceleration.pdf on page-210 , I found that while doing hardware emulation with profiling on (using conventional host.cpp) there is a section in waveform window which captures data flow from host to global memory. Just wanted to confirm if it is intrinsically supported using PYNQ flow and if that is the case I might have missed some config setting.
That should be kernel to global memory, not host. As the first paragraph on the previous page states: <<The Vitis core development kit can generate a Waveform view when running hardware emulation. It displays in-depth details at the system-level, CU level, and at the function level. The details include data transfers between the kernel and global memory and data flow through interkernel pipes. These details provide many insights into performance bottlenecks from the systemlevel down to individual function calls to help optimize your application.>>
There is a description which follows this image on page 211. It reads
"The hierarchy of the Waveform and Live Waveform views include the following: • Device "name": Target device name. • Binary Container "name": Binary container name. • Memory Data Transfers: For each DDR Bank, this shows the trace of all the read and write request transactions arriving at the bank from the host. • Kernel "name" 1:1:1: For each kernel and for each compute unit of that kernel, this section breaks down the activities originating from the compute unit."
I will actually try it out using conventional "c++" flow using host.cpp & update here
Unfortunately the way memory profiling is done in Vitis is OpenCL specific. It's something we're looking into but I can't give a time on when it might be supported. We're somewhat at the mercy of the lower-level libraries we build on for this type of functionality.
Thanks Peter for the clarification.
I am trying to analyze the waveform generated from the PYNQ-ALVEO example notebook _"2hardware-emulation.ipynb". In waveform I observed that there is no section of host interacting with global memory. Is it that python host profiling is not yet supported & profiler expects host program to be coded in C++?