The current report API can only support retrieving reports from a single kernel. However, sometimes the generated code may implicitly or explicitly have multiple functions, which causes errors when displaying results.
An example is shown below. Since inter-kernel data placement is enabled, there will be a dataflow pragma generated on top of the HLS C code. The HLS tool may automatically extract subfunctions for those stages.
import heterocl as hcl
import numpy as np
def test_stages():
A = hcl.placeholder((32, 32), "A")
C = hcl.placeholder((32, 32), "C")
def kernel(A, C):
B = hcl.compute(A.shape, lambda i, j : A[i, j] + 1, "B")
D = hcl.compute(A.shape, lambda i, j : B[i, j] + 1, "D")
E = hcl.compute(A.shape, lambda i, j : C[i, j] + 1, "E")
F = hcl.compute(A.shape, lambda i, j : D[i, j] + E[i, j], "F")
return F
target = hcl.Platform.xilinx_zc706
target.config(compiler="vivado_hls", mode="csyn", project="stages-tvm.prj")
s = hcl.create_schedule([A, C], kernel)
s.to(kernel.B, s[kernel.D])
s.to(kernel.D, s[kernel.F])
s.to(kernel.E, s[kernel.F])
mod = hcl.build(s, target=target)
np_A = np.zeros((32, 32))
np_C = np.zeros((32, 32))
np_F = np.zeros((32, 32))
hcl_A = hcl.asarray(np_A)
hcl_C = hcl.asarray(np_C)
hcl_F = hcl.asarray(np_F)
mod(hcl_A, hcl_C, hcl_F)
report = mod.report()
report.display()
if __name__ == "__main__":
test_stages()
As a result, there are no "loops" in the top function, but only several function instances. (See below)
Results of the original loops will be in separate .rpt files like Loop_B.rpt, Loop_D.rpt, Loop_E.rpt, etc.
+ Detail:
* Instance:
+--------------------+-----------------+---------+---------+-----------+-----------+------+------+---------+
| | | Latency (cycles) | Latency (absolute) | Interval | Pipeline|
| Instance | Module | min | max | min | max | min | max | Type |
+--------------------+-----------------+---------+---------+-----------+-----------+------+------+---------+
|Loop_E_i_proc4_U0 |Loop_E_i_proc4 | 3137| 3137| 31.370 us | 31.370 us | 3137| 3137| none |
|Loop_B_i1_proc5_U0 |Loop_B_i1_proc5 | 3137| 3137| 31.370 us | 31.370 us | 3137| 3137| none |
|Loop_F_i3_proc7_U0 |Loop_F_i3_proc7 | 2113| 2113| 21.130 us | 21.130 us | 2113| 2113| none |
|Loop_D_i2_proc6_U0 |Loop_D_i2_proc6 | 2113| 2113| 21.130 us | 21.130 us | 2113| 2113| none |
+--------------------+-----------------+---------+---------+-----------+-----------+------+------+---------+
* Loop:
N/A
Thus, HeteroCL runs into an error since there are no loops available in the top .rpt file.
Traceback (most recent call last):
File "test_stages.py", line 41, in <module>
test_stages()
File "test_stages.py", line 36, in test_stages
report = mod.report()
File "/scratch/users/hc676/heterocl/python/heterocl/tvm/module.py", line 94, in report
return report_stats(target, self.name)
File "/scratch/users/hc676/heterocl/python/heterocl/report.py", line 470, in report_stats
return parse_xml(path, "Vivado HLS")
File "/scratch/users/hc676/heterocl/python/heterocl/report.py", line 456, in parse_xml
summary = perf_estimate["SummaryOfLoopLatency"]
KeyError: 'SummaryOfLoopLatency'
It would be great if we can summarize the results in different files and display them in a single table as we did for a single kernel.
If you are free, could you take a look at this? @yn224
Thanks!
The current report API can only support retrieving reports from a single kernel. However, sometimes the generated code may implicitly or explicitly have multiple functions, which causes errors when displaying results.
An example is shown below. Since inter-kernel data placement is enabled, there will be a
dataflow
pragma generated on top of the HLS C code. The HLS tool may automatically extract subfunctions for those stages.As a result, there are no "loops" in the top function, but only several function instances. (See below) Results of the original loops will be in separate
.rpt
files likeLoop_B.rpt
,Loop_D.rpt
,Loop_E.rpt
, etc.Thus, HeteroCL runs into an error since there are no loops available in the top
.rpt
file.It would be great if we can summarize the results in different files and display them in a single table as we did for a single kernel.
If you are free, could you take a look at this? @yn224 Thanks!