cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

[API] Report Display Row Organization Bug Fix & Addition of Stage Query functionality #395

Closed yn224 closed 2 years ago

yn224 commented 3 years ago

This PR attempts to fix a minor issue that the report displayer has, where it incorrectly places latency information when loop optimization primitives have been applied to different loops. Also, it enhances the display query capabilities for loop names using actual stages.

Bug fix

For instance, when the user applies pipeline to the first nested loop of E, the output looks like such:

+-----------+--------------+-----------+---------------------+---------------+------------------+
|           |   Trip Count |   Latency |   Iteration Latency |   Pipeline II |   Pipeline Depth |
|-----------+--------------+-----------+---------------------+---------------+------------------|
| B_x       |          400 |   2240800 |                5602 |           N/A |              N/A |
| + B_y     |          400 |      5600 |                  14 |           N/A |              N/A |
| E_x1_E_y1 |            3 |       123 |                  41 |           N/A |              N/A |
| D_x3      |            3 |        39 |                  13 |           N/A |              N/A |
| + D_y2    |       158404 |    792139 |                 N/A |             5 |              125 |
| ++ D_ra0  |          398 |     52138 |                 131 |           N/A |              N/A |
| +++ D_ra1 |          398 |  20751720 |               52140 |           N/A |              N/A |
| Fimg_x5   |          398 |     11144 |                  28 |           N/A |              N/A |
| + Fimg_y3 |          398 |   4436108 |               11146 |           N/A |              N/A |
+-----------+--------------+-----------+---------------------+---------------+------------------+
* Units in clock cycles

whereas the correct output should be this

+-----------+--------------+-----------+---------------------+---------------+------------------+
|           |   Trip Count |   Latency |   Iteration Latency |   Pipeline II |   Pipeline Depth |
|-----------+--------------+-----------+---------------------+---------------+------------------|
| B_x       |          400 |   2240800 |                5602 |           N/A |              N/A |
| + B_y     |          400 |      5600 |                  14 |           N/A |              N/A |
| E_x1_E_y1 |       158404 |    792139 |                 N/A |             5 |              125 |
| D_x3      |          398 |  20751720 |               52140 |           N/A |              N/A |
| + D_y2    |          398 |     52138 |                 131 |           N/A |              N/A |
| ++ D_ra0  |            3 |       123 |                  41 |           N/A |              N/A |
| +++ D_ra1 |            3 |        39 |                  13 |           N/A |              N/A |
| Fimg_x5   |          398 |   4436108 |               11146 |           N/A |              N/A |
| + Fimg_y3 |          398 |     11144 |                  28 |           N/A |              N/A |
+-----------+--------------+-----------+---------------------+---------------+------------------+
* Units in clock cycles

since the pipeline primitive has been applied to loop E instead of D_y2.

This example output is from an unoptimized version (commenting out all the optimization techniques) of the Sobel example mentioned in PR #384 except loop E pipelined on axis[1].

Query Functionality

Also, we can now use actual stages to query the loops instead of querying them by loop names. For instance, we can not only query the B loop by using report.display(loop=['B']) but also can use report.display(loop=[s[kernel.B]]).

yn224 commented 3 years ago

@seanlatias @Hecmay Can I ask for a review for this branch? I'll open new PR for separate improvements in the API.

seanlatias commented 3 years ago

Can we try to add unit tests for all these features? Now we only have tests on complete examples. It's hard to know which test contains which features.

yn224 commented 2 years ago

Can we try to add unit tests for all these features? Now we only have tests on complete examples. It's hard to know which test contains which features.

The only functionality added in this PR is querying by the stage directly instead of explicit notation of loop name, which may require actually running the HLS. Would you want the test to be able to run by having vhls set to true?

seanlatias commented 2 years ago

No. I mean we should add unit test (i.e., a test whose sole purpose is to test certain features) instead of using a complete application. I thought you also fix a bug in this PR? For the unit test, it should be a minimal program that can reproduce your bug. And we should see that the bug is fixed after your PR.