Closed chhzh123 closed 3 years ago
Can you try this?
A_ = s.to(A, target.xcel)
s.partition(A_, hcl.Partition.Block, dim=1, factor=2)
No, this doesn't work.
Traceback (most recent call last):
File "partition.py", line 34, in <module>
partition_test()
File "partition.py", line 23, in partition_test
f = hcl.build(s, target)
File "/home/chz/heterocl/python/heterocl/api.py", line 318, in build
return _build(schedule.sch, new_inputs, target=target, name=name, stmt=stmt)
File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 546, in build
return build_fpga_kernel(sch, args, target, name=name)
File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 422, in build_fpga_kernel
flist = lower(sch, args, kernel_only=True, name=name)
File "/home/chz/heterocl/python/heterocl/tvm/build_module.py", line 340, in lower
sch = schedule.ScopePartition(sch)
File "/home/chz/heterocl/python/heterocl/tvm/_ffi/function.py", line 280, in my_api_func
return flocal(*args)
File "/home/chz/heterocl/python/heterocl/tvm/_ffi/_ctypes/function.py", line 183, in __call__
ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
File "/home/chz/heterocl/python/heterocl/tvm/_ffi/base.py", line 66, in check_call
raise TVMError(py_str(_LIB.TVMGetLastError()))
heterocl.tvm._ffi.base.TVMError: [01:26:19] src/schedule/schedule_reorder.cc:78: Check failed: i-1 > 0 (-1 vs. 0) wrong op ordering fonud
Fixed in #206.
A weird dataflow
pragma is generated after #206 is merged. Some codegen logic must be wrong.
// HASH:1285778973
#include <ap_int.h>
#include <ap_fixed.h>
#include <ap_int.h>
#include <ap_fixed.h>
#include <ap_axi_sdata.h>
#include <hls_stream.h>
#include <math.h>
#include <stdint.h>
#include "kernel.h"
void test(hls::stream<ap_uint<8> >& A_channel, hls::stream<ap_uint<8> >& B_channel) {
#pragma HLS INTERFACE axis port=A_channel offset=slave bundle=gmem0
#pragma HLS INTERFACE axis port=B_channel offset=slave bundle=gmem1
#pragma HLS INTERFACE s_axilite port=return bundle=control
#pragma HLS dataflow // HERE!!!
ap_uint<8> B[10][10];
ap_uint<8> A[10][10];
#pragma HLS array_partition variable=A block dim=1 factor=2
bit32 A_partitioned;
LOOP1: for (bit32 A0 = 0; A0 < 10; ++A0) {
for (bit32 A1 = 0; A1 < 10; ++A1) {
A[A0][A1] = A_channel.read();
}
}
LOOP2: for (bit32 args = 0; args < 10; ++args) {
for (bit32 args0 = 0; args0 < 10; ++args0) {
B[args][args0] = ((ap_uint<8>)(((ubit32)A[args][args0]) + 1U));
}
}
LOOP3: for (bit32 B0 = 0; B0 < 10; ++B0) {
for (bit32 B1 = 0; B1 < 10; ++B1) {
B_channel.write(B[B0][B1]);
}
}
}
The dataflow pragma is inserted by default: https://github.com/cornell-zhang/heterocl/blob/master/tvm/src/codegen/hlsc/codegen_vhls.cc#L696
Ideally we want to make the write/read part and main logic to overlap, but we should expose this decision to users and need to add a primitive for dataflow.
@chhzh123 Please add test cases under tests/issues/219
The dataflow pragma is inserted by default: https://github.com/cornell-zhang/heterocl/blob/master/tvm/src/codegen/hlsc/codegen_vhls.cc#L696
It seems the pragma is not generated "by default", since only this case will cause automatic generation.
I'll add a primitive for dataflow and some test cases for this issue soon.
Thanks @chhzh123. The condition for the dataflow pragma generation is very inaccurate. It will be great if you can replace it with a better one, or just remove it.
The dataflow primitive should be straightforward to add: you may add an attribute to For
node and KernelDef
node, and print out the pragma in VHLS codegen.
A weird
dataflow
pragma is generated after #206 is merged. Some codegen logic must be wrong.
Fixed in #245 .
Issue fixed. Test cases added here: https://github.com/cornell-zhang/heterocl/blob/heteroflow/tests/issues/test_issue_219.py
When
.to
is used, memory optimization commands fail to work. Below is an example.Here, a
partition
function is operated on arrayA
. This piece of code can run correctly without using.to
. However, when streaming is added, the following error occurs.Putting
s.partition
after.to
may pass compilation, but#pragma HLS array_partition
is still not shown in the kernel code.