cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

[Schedule] Support for intra-kernel data placement #436

Open hecmay opened 2 years ago

hecmay commented 2 years ago

This PR aims to enhance .systolic() and .to() primitive to better support intra-kernel data placement for systolic array generation using AutoSA backend.

.systolic() primitive is a push-button API that maps the compute kernel to a systolic array automatically (while the dataflow pattern is left to compiler's decision). .to() primitive provides more flexibility for expert designers to explore the trade-offs of different systolic dataflows.

I have successfully solved the dependency issues and installed AutoSA on our local server. In this PR, i will also add the CI/CD local testing for systolic array programs with AutoSA backend.

hecmay commented 2 years ago

@zzzDavid @chhzh123 can you maybe take a quick pass on this PR? Thanks!

hecmay commented 2 years ago

Sorry for the late review. I’ve looked through the code and think maybe you could add more descriptions for this PR. Seems you have added several new features besides the AutoSA backend.

  1. I notice you introduced new APIs like transpose and pack, and new passes like transform_layout and explicit_unroll, could you also describe the changes in this PR?
  2. Just a small question: You are not writing a C++ codegen for AutoSA right? All the compilation happens at the Python level (except for some transformation passes).

Thanks for pointing that out.

  1. These new APIs (e.g., packing, layout transformation) are necessary to generate a high-throughput memory subsystem for the GEMM systolic array. I will add more explanations on these new APIs.
  2. The AutoSA codegen in HCL is a mix of C++ and python rn - the HLS/OpenCL code generator (i.e., C++ part) will call a utility function (i.e., python part) that is responsible for inferring the CLI arguments and then invoking AutoSA. I can probably implement that utility function in C++, which would make the flow a bit cleaner