amd / xdna-driver

Other
324 stars 42 forks source link

Add performance scripts #238

Closed maxzhen closed 2 months ago

maxzhen commented 3 months ago

This PR adds two shell scripts for performance analysis based on the built-in trace events in XRT. After installation of the amdxdna plug-in package, these scripts can be found under /opt/xilinx/xrt/amdxdna. They rely on 'perf' command on Linux, so it has to be available in your PATH env.

Example:

Let's first collect performance data from xrt-smi validate -r latency test

# /opt/xilinx/xrt/amdxdna/npu_perf_trace.sh /opt/xilinx/xrt/bin/xrt-smi validate -d -r latency
[INFO]: Found NPU device 0000:c5:00.1 at /sys/kernel/debug/accel
[INFO]: XRT SDT is removed
[INFO]: XRT SDT is added
[INFO]: perf record -e amdxdna_trace:* -e sdt_xrt:*  -a /opt/xilinx/xrt/bin/xrt-smi validate -d -r latency
Validate Device           : [0000:c5:00.1]
    Platform              : RyzenAI-npu4
    Power Mode            : Default
-------------------------------------------------------------------------------
Verbose: Enabling Verbosity
Test 1 [0000:c5:00.1]     : latency                                             
    Description           : Run end-to-end latency test
    Xclbin                : /opt/xilinx/xrt/amdxdna/bins/17f0_10/validate.xclbin
    Details               : Kernel name is 'DPU_PDI_0'
                            Instruction size: '20' bytes
                            No. of iterations: '10000'
                            Average latency: '46.4' us
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
Validation completed
[ perf record: Woken up 65 times to write data ]
[ perf record: Captured and wrote 17.190 MB perf.data (170133 samples) ]
[INFO]: XRT SDT is removed

Now, let's take a look at average time between xrt::run.start() and xrt::run.wait2() (skipping the first 100 events since they may be slower due to CPU frequence ramping up)

# /opt/xilinx/xrt/amdxdna/npu_perf_analyze.sh 100: "sdt_xrt:xrt_run_start_enter:" "sdt_xrt:xrt_run_wait2_exit:"
Parsing perf.converted.out...
10000 events for: 'sdt_xrt:xrt_run_start_enter:'
10000 events for: 'sdt_xrt:xrt_run_wait2_exit:'
Average over 9900 events: 44us
Largest: 121us@5901
Smallest: 28us@551