Xilinx / Alveo-PYNQ

Introductory examples for using PYNQ with Alveo
Apache License 2.0
48 stars 17 forks source link

Query on hardware emulation profiling with PYNQ #16

Closed rahulj175 closed 2 years ago

rahulj175 commented 3 years ago

I am trying to analyze the waveform generated from the PYNQ-ALVEO example notebook _"2hardware-emulation.ipynb". In waveform I observed that there is no section of host interacting with global memory. Is it that python host profiling is not yet supported & profiler expects host program to be coded in C++?

giunatale commented 3 years ago

Correct me if I am wrong, but I think the host<->mem part is a bit out of scope with what is provided in that notebook. You are emulating the kernel execution in Vivado, so anything that happens outside of the kernel execution scope is not included in the waveform.

rahulj175 commented 3 years ago

I am actually new to the PYNQ & Vitis flow. I want to baseline host API (wrapped over openCL APIs) call timings with respect to the kernel execution time using PYNQ flow.

I was going through the tutorials & material available for Vitis. In one of the Xilinx documents https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug1393-vitis-application-acceleration.pdf on page-210 , I found that while doing hardware emulation with profiling on (using conventional host.cpp) there is a section in waveform window which captures data flow from host to global memory. Just wanted to confirm if it is intrinsically supported using PYNQ flow and if that is the case I might have missed some config setting. Capture

rahulj175 commented 3 years ago

There is one example notebook "1-efficient-accelerator-scheduling" where concept of compute & data transfer overlapping (pipelining) is explained. If we can baseline host call timings with respect to kernel execution time (as with conventional host.cpp) , we can do this overlapping more efficiently. Let me know if I am mistaken here.

giunatale commented 3 years ago

I am actually new to the PYNQ & Vitis flow. I want to baseline host API (wrapped over openCL APIs) call timings with respect to the kernel execution time using PYNQ flow.

I was going through the tutorials & material available for Vitis. In one of the Xilinx documents https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug1393-vitis-application-acceleration.pdf on page-210 , I found that while doing hardware emulation with profiling on (using conventional host.cpp) there is a section in waveform window which captures data flow from host to global memory. Just wanted to confirm if it is intrinsically supported using PYNQ flow and if that is the case I might have missed some config setting. Capture

That should be kernel to global memory, not host. As the first paragraph on the previous page states: <<The Vitis core development kit can generate a Waveform view when running hardware emulation. It displays in-depth details at the system-level, CU level, and at the function level. The details include data transfers between the kernel and global memory and data flow through interkernel pipes. These details provide many insights into performance bottlenecks from the systemlevel down to individual function calls to help optimize your application.>>

rahulj175 commented 3 years ago

There is a description which follows this image on page 211. It reads

"The hierarchy of the Waveform and Live Waveform views include the following: • Device "name": Target device name. • Binary Container "name": Binary container name. • Memory Data Transfers: For each DDR Bank, this shows the trace of all the read and write request transactions arriving at the bank from the host. • Kernel "name" 1:1:1: For each kernel and for each compute unit of that kernel, this section breaks down the activities originating from the compute unit."

I will actually try it out using conventional "c++" flow using host.cpp & update here

PeterOgden commented 3 years ago

Unfortunately the way memory profiling is done in Vitis is OpenCL specific. It's something we're looking into but I can't give a time on when it might be supported. We're somewhat at the mercy of the lower-level libraries we build on for this type of functionality.

rahulj175 commented 3 years ago

Thanks Peter for the clarification.