cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

`.to` disables memory optimization #219

Closed chhzh123 closed 3 years ago

chhzh123 commented 4 years ago

When .to is used, memory optimization commands fail to work. Below is an example.

def simple_add():

    A = hcl.placeholder((10, 10), "A", dtype=hcl.UInt(8))
    def kernel(A):
        B = hcl.compute(A.shape, lambda *args : A[args] + 1, "B", dtype=hcl.UInt(8))
        return B

    target = hcl.platform.zc706
    s = hcl.create_schedule([A], kernel)
    s.partition(A, hcl.Partition.Block, dim=1, factor=2) # memory optimization
    s.to(A, target.xcel)
    s.to(kernel.B, target.host)
    target.config(compile="vivado_hls", mode="csim")
    f = hcl.build(s, target)

    np_A = np.random.randint(10, size=(10,10))
    np_B = np.zeros((10,10))

    hcl_A = hcl.asarray(np_A, dtype=hcl.UInt(8))
    hcl_B = hcl.asarray(np_B, dtype=hcl.UInt(8))
    f(hcl_A, hcl_B)
    ret_B = hcl_B.asnumpy()

Here, a partition function is operated on array A. This piece of code can run correctly without using .to. However, when streaming is added, the following error occurs.

Traceback (most recent call last):
  File "partition.py", line 34, in <module>
    simple_add()
  File "partition.py", line 23, in simple_add
    f = hcl.build(s, target)
  File "/home/chz/heterocl/python/heterocl/api.py", line 318, in build
    return _build(schedule.sch, new_inputs, target=target, name=name, stmt=stmt)
  File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 543, in build
    return build_fpga_kernel(sch, args, target, name=name)
  File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 491, in build_fpga_kernel
    return builder(fdevice, keys, vals)
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/function.py", line 280, in my_api_func
    return flocal(*args)
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/_ctypes/function.py", line 183, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/base.py", line 66, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
heterocl.tvm._ffi.base.TVMError: [00:19:30] src/codegen/codegen_source_base.cc:85: Check failed: it != var_idmap_.end() Find undefined Variable A

Putting s.partition after .to may pass compilation, but #pragma HLS array_partition is still not shown in the kernel code.

hecmay commented 4 years ago

Can you try this?

A_ = s.to(A, target.xcel)
s.partition(A_, hcl.Partition.Block, dim=1, factor=2)
chhzh123 commented 4 years ago

No, this doesn't work.

Traceback (most recent call last):
  File "partition.py", line 34, in <module>
    partition_test()
  File "partition.py", line 23, in partition_test
    f = hcl.build(s, target)
  File "/home/chz/heterocl/python/heterocl/api.py", line 318, in build
    return _build(schedule.sch, new_inputs, target=target, name=name, stmt=stmt)
  File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 546, in build
    return build_fpga_kernel(sch, args, target, name=name)
  File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 422, in build_fpga_kernel
    flist = lower(sch, args, kernel_only=True, name=name)
  File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 340, in lower
    sch = schedule.ScopePartition(sch)
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/function.py", line 280, in my_api_func
    return flocal(*args)
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/_ctypes/function.py", line 183, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/chz/heterocl/python/heterocl/tvm/_ffi/base.py", line 66, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
heterocl.tvm._ffi.base.TVMError: [01:26:19] src/schedule/schedule_reorder.cc:78: Check failed: i-1 > 0 (-1 vs. 0) wrong op ordering fonud
hecmay commented 4 years ago

Fixed in #206.

chhzh123 commented 4 years ago

A weird dataflow pragma is generated after #206 is merged. Some codegen logic must be wrong.

// HASH:1285778973
#include <ap_int.h>
#include <ap_fixed.h>
#include <ap_int.h>
#include <ap_fixed.h>
#include <ap_axi_sdata.h>
#include <hls_stream.h>
#include <math.h>
#include <stdint.h>
#include "kernel.h"
  void test(hls::stream<ap_uint<8> >& A_channel, hls::stream<ap_uint<8> >& B_channel) {
  #pragma HLS INTERFACE axis port=A_channel offset=slave bundle=gmem0
  #pragma HLS INTERFACE axis port=B_channel offset=slave bundle=gmem1
  #pragma HLS INTERFACE s_axilite port=return bundle=control

  #pragma HLS dataflow // HERE!!!
    ap_uint<8> B[10][10];
    ap_uint<8> A[10][10];
    #pragma HLS array_partition variable=A block dim=1 factor=2
    bit32 A_partitioned;
LOOP1: for (bit32 A0 = 0; A0 < 10; ++A0) {
      for (bit32 A1 = 0; A1 < 10; ++A1) {
        A[A0][A1] = A_channel.read();
      }
    }
LOOP2: for (bit32 args = 0; args < 10; ++args) {
      for (bit32 args0 = 0; args0 < 10; ++args0) {
        B[args][args0] = ((ap_uint<8>)(((ubit32)A[args][args0]) + 1U));
      }
    }
LOOP3: for (bit32 B0 = 0; B0 < 10; ++B0) {
      for (bit32 B1 = 0; B1 < 10; ++B1) {
        B_channel.write(B[B0][B1]);
      }
    }
  }
hecmay commented 4 years ago

The dataflow pragma is inserted by default: https://github.com/cornell-zhang/heterocl/blob/master/tvm/src/codegen/hlsc/codegen_vhls.cc#L696

Ideally we want to make the write/read part and main logic to overlap, but we should expose this decision to users and need to add a primitive for dataflow.

zhangzhiru commented 4 years ago

@chhzh123 Please add test cases under tests/issues/219

chhzh123 commented 4 years ago

The dataflow pragma is inserted by default: https://github.com/cornell-zhang/heterocl/blob/master/tvm/src/codegen/hlsc/codegen_vhls.cc#L696

It seems the pragma is not generated "by default", since only this case will cause automatic generation.

chhzh123 commented 4 years ago

I'll add a primitive for dataflow and some test cases for this issue soon.

hecmay commented 4 years ago

Thanks @chhzh123. The condition for the dataflow pragma generation is very inaccurate. It will be great if you can replace it with a better one, or just remove it.

The dataflow primitive should be straightforward to add: you may add an attribute to For node and KernelDef node, and print out the pragma in VHLS codegen.

chhzh123 commented 4 years ago

A weird dataflow pragma is generated after #206 is merged. Some codegen logic must be wrong.

Fixed in #245 .

hecmay commented 3 years ago

Issue fixed. Test cases added here: https://github.com/cornell-zhang/heterocl/blob/heteroflow/tests/issues/test_issue_219.py