cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
326 stars 92 forks source link

Enhancement for HLS Report Retrieval for Multiple Functions #435

Open chhzh123 opened 2 years ago

chhzh123 commented 2 years ago

The current report API can only support retrieving reports from a single kernel. However, sometimes the generated code may implicitly or explicitly have multiple functions, which causes errors when displaying results.

An example is shown below. Since inter-kernel data placement is enabled, there will be a dataflow pragma generated on top of the HLS C code. The HLS tool may automatically extract subfunctions for those stages.

import heterocl as hcl
import numpy as np

def test_stages():

    A = hcl.placeholder((32, 32), "A")
    C = hcl.placeholder((32, 32), "C")
    def kernel(A, C):
        B = hcl.compute(A.shape, lambda i, j : A[i, j] + 1, "B")
        D = hcl.compute(A.shape, lambda i, j : B[i, j] + 1, "D")
        E = hcl.compute(A.shape, lambda i, j : C[i, j] + 1, "E")
        F = hcl.compute(A.shape, lambda i, j : D[i, j] + E[i, j], "F")
        return F

    target = hcl.Platform.xilinx_zc706
    target.config(compiler="vivado_hls", mode="csyn", project="stages-tvm.prj")
    s = hcl.create_schedule([A, C], kernel)
    s.to(kernel.B, s[kernel.D])
    s.to(kernel.D, s[kernel.F])
    s.to(kernel.E, s[kernel.F])
    mod = hcl.build(s, target=target)
    np_A = np.zeros((32, 32))
    np_C = np.zeros((32, 32))
    np_F = np.zeros((32, 32))
    hcl_A = hcl.asarray(np_A)
    hcl_C = hcl.asarray(np_C)
    hcl_F = hcl.asarray(np_F)
    mod(hcl_A, hcl_C, hcl_F)
    report = mod.report()
    report.display()

if __name__ == "__main__":
    test_stages()

As a result, there are no "loops" in the top function, but only several function instances. (See below) Results of the original loops will be in separate .rpt files like Loop_B.rpt, Loop_D.rpt, Loop_E.rpt, etc.

    + Detail: 
        * Instance: 
        +--------------------+-----------------+---------+---------+-----------+-----------+------+------+---------+
        |                    |                 |  Latency (cycles) |   Latency (absolute)  |   Interval  | Pipeline|
        |      Instance      |      Module     |   min   |   max   |    min    |    max    |  min |  max |   Type  |
        +--------------------+-----------------+---------+---------+-----------+-----------+------+------+---------+
        |Loop_E_i_proc4_U0   |Loop_E_i_proc4   |     3137|     3137| 31.370 us | 31.370 us |  3137|  3137|   none  |
        |Loop_B_i1_proc5_U0  |Loop_B_i1_proc5  |     3137|     3137| 31.370 us | 31.370 us |  3137|  3137|   none  |
        |Loop_F_i3_proc7_U0  |Loop_F_i3_proc7  |     2113|     2113| 21.130 us | 21.130 us |  2113|  2113|   none  |
        |Loop_D_i2_proc6_U0  |Loop_D_i2_proc6  |     2113|     2113| 21.130 us | 21.130 us |  2113|  2113|   none  |
        +--------------------+-----------------+---------+---------+-----------+-----------+------+------+---------+

        * Loop: 
        N/A

Thus, HeteroCL runs into an error since there are no loops available in the top .rpt file.

Traceback (most recent call last):
  File "test_stages.py", line 41, in <module>
    test_stages()
  File "test_stages.py", line 36, in test_stages
    report = mod.report()
  File "/scratch/users/hc676/heterocl/python/heterocl/tvm/module.py", line 94, in report
    return report_stats(target, self.name)
  File "/scratch/users/hc676/heterocl/python/heterocl/report.py", line 470, in report_stats
    return parse_xml(path, "Vivado HLS")
  File "/scratch/users/hc676/heterocl/python/heterocl/report.py", line 456, in parse_xml
    summary = perf_estimate["SummaryOfLoopLatency"]
KeyError: 'SummaryOfLoopLatency'

It would be great if we can summarize the results in different files and display them in a single table as we did for a single kernel.

If you are free, could you take a look at this? @yn224 Thanks!