Automatic IO optimization support

cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing

https://cornell-zhang.github.io/heterocl/

Apache License 2.0

322 stars 92 forks source link

Automatic IO optimization support #314

Open hecmay opened 3 years ago

hecmay commented 3 years ago

We need some pass to optimize the memory access automatically. Examples:

Pack multiple low bitwidth IO ports into a single IO port to better utilize the BW
Automatic binding IO port with memory bank
- E.g. If target device has PLRAM and data size less than 128KB, bind it to PLRAM for lowest access latency
- Automatic interleave memory bank group for continuous memory access
Automatic AXI group and burst length assignment to improve the BW efficiency

Reference:

Xilinx developer's guide: How to improve HBM BW
FCCM20 Shuhai: https://wangzeke.github.io/doc/shuhai_fccm_20.pdf

zhangzhiru commented 3 years ago

How do we plan to provide abstraction for these different I/O optimizations?

hecmay commented 3 years ago

Can you explain more? I do not get what you mean by saying the abstraction of the optimization.

If you are talking about the interface, those optimizations can be manually specified by .to() with different options by users:

s.to(A, target.HBM[0], /* we can add more options here*/)

if you are talking about the optimization algorithm. Yes, we will need to create a cost model based on the computation graph (we have a dataflow graph representation of the program under the hood. The graph has well defined structures, including nodes, edges and subgraphs), and we optimize the program based on the cost model.

zhangzhiru commented 3 years ago

You described a set of low-level optimizations offered by the vendor. How do you plan to make them available to the programmers? The important decisions are in the comment / we can add more options /. What options? How do we categorize these options? Which ones are manual and which ones are automated? These are the so called abstractions. If they end up looking exactly the same with the what Vivado/Vitis offers, we are not making things easier to the programmers.

hecmay commented 3 years ago

Since we are doing automatic optimization, all these should be done automatically if the clients are not specifying any of them. The interface or abstraction is definitely much simplified than what Vivado/Vitis has.

Instead of adding more options to .to(), we can also use a context manager (like what TVM does) to provide an interface to configure or turn on/off these automatic optimizations:

with tvm.transform.PassContext(opt_level=3, config={"relay.backend.use_auto_scheduler": True}):
    lib = relay.build(mod, target=target, params=params)