cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

[API][Backend] Enhanced Data Movement Support #332

Closed hecmay closed 3 years ago

hecmay commented 3 years ago

This PR is separated from PR #321. It mainly introduces the following features (these four features are closely coupled, so I feel it should be better to keep them in the same PR).

API to customize diverse communication patterns

  1. Cross-device data movement. E.g., moving data between CPU and FPGAs. We have DMA and streaming mode for off-chip data accessing right now (MMIO and SVM will be supported in the future)
    target = hcl.Platform.xilinx_zc706
    s.to(kernel.A, target.xcel, mode=hcl.IO.DMA)
    s.to(kernel.output, target.host, mode=hcl.IO.Stream)

    test cases: https://github.com/Hecmay/heterocl/blob/merge/tests/test_schedule_stream.py#L273-L376

In this example, HCL generates HLS code annotated with directives that guides EDA tools to instantiate an AXI-DMA IP for the SoC design, which handles the off-chip data access through AXIS or memory-mapped AXI ports. For PCIe-connected cloud FPGAs, HCL allows users to specify the memory banking with subscription operators:

target = hcl.Platform.u280
s.to(kernel.A, target.xcel.DRAM[0])
s.to(kernel.output, target.host.HBM[0])

test cases: https://github.com/Hecmay/heterocl/blob/merge/tests/test_schedule_stream.py#L274-L299

  1. On-chip data movement. The on-chip data can be moved from point to point, or broadcast from a single producer to multiple consumers. HCL currently requires users to specify the FIFO depth. In the future, we will add support for automatic FIFO size interference.
s.to(kernel.A, [kernel.B, kernel.C], depth=1) # Broadcasting
s.to(kernel.A, kernel.B) # P2P data movement

test cases: https://github.com/Hecmay/heterocl/blob/merge/tests/test_schedule_stream.py#L134-L217

In this code snippet, HCL checks the data access pattern underneath -- For random data access, HCL generates a double buffer between producers and consumers. The data access in a sequential fashion (or can be made sequential by adding reuse buffers) will be automatically detected by HCL, and a FIFO will be created for data movement.

Abstract class for different compute and memory devices

HCL provides abstractions for shared buffers between FPGA and host (i.e. DRAM, HBM, PLRAM) as well as private buffers on FPGA (i.e. BRAM, FF, URAM). Programmers can create their own custom platforms with different devices by using HCL-provided class and APIs.

target = hcl.Platform.custom
target.add(hcl.dev.CPU("intel", "e5"))
target.add(hcl.dev.FPGA("xilinx", "u250"))

test cases: https://github.com/Hecmay/heterocl/blob/merge/tests/test_schedule_stream.py#L275-L299

Compiler passes for host-device partitioning.

We collect the placement information injected into nodes in the DFG and partition the DFG into a non-overlapping subgraph for device and host. The partitioning algorithm traverses the nodes in the DFG in a post DFS order. All the nodes being marked to be placed on FPGA are supposed to form a boundary of the device subgraph. The nodes inside the boundary are automatically placed on FPGA, otherwise, host.

test cases: https://github.com/Hecmay/heterocl/blob/merge/tests/test_schedule_stream.py#L6-L55

OpenCL host code generator for AOCL and Vitis

For the host logic that calls the FPGA accelerator, we use a template-based approach for code generation. The host code template consists of three parts 1) initialization (loading bitstreams, initializing device function, e.t.c) 2) allocating OpenCL device buffers and binding, 3) call the device function and retrieve the output

test cases: https://github.com/Hecmay/heterocl/blob/merge/tests/test_runtime_build.py#L6-L232

seanlatias commented 3 years ago

@Hecmay can you also extend your top comment? For each feature, list the corresponding APIs and test cases.

hecmay commented 3 years ago

Okay. Will do.

seanlatias commented 3 years ago

You still haven't provided the corresponding test cases for each feature.

seanlatias commented 3 years ago

@Hecmay please let me know if you are done. We should have at least one test case for each feature. If not, we should add one. Also, could you look through the existing test cases and see which are not useful? Some test cases are commented out and some test cases are not checking anything.

hecmay commented 3 years ago

@seanlatias Okay. I will clean up the test cases a bit and add the links here.

seanlatias commented 3 years ago

We can split them into two files if necessary.

hecmay commented 3 years ago

Issue #351

seanlatias commented 3 years ago

@Hecmay forgot to mention. Please check which issues are resolved with this PR. If so, please create a PR to include those issues as test cases. Thanks.