Closed hecmay closed 3 years ago
@Hecmay Did you add back the interface for Vivado HLS? #261 is not only a Vitis problem, but also works for Vivado HLS.
You mean Vivado HLS also requires the port width to be multiple of 8? No. Vivado HLS does not require the interface pragmas. It is only required in Vits 2019.2.
I mean our codegen will generate
#pragma HLS INTERFACE
for Vivado HLS now, which requires the bitwidth of the input argument to be a multiple of 8.
Removed now. Now the interface pragmas are only intended for Vitis flow.
It is interesting. I just copied @chhzh123 your changes to this PR, and got the same error from Keras, even though we did not touch Keras at all...
It is interesting. I just copied @chhzh123 your changes to this PR, and got the same error from Keras, even though we did not touch Keras at all...
Maybe the Keras test can be commented out. I think it is not our major focus at this time.
The following stages cannot be streamed correctly, which declares buffers first and updates them using imperative grammas.
def test_zero():
A = hcl.placeholder((10,), "A")
def kernel(A):
B = hcl.compute(A.shape, lambda i: A[i] + 1, "B")
C1 = hcl.compute(A.shape, lambda i: 0, "C1")
C2 = hcl.compute(A.shape, lambda i: 0, "C2")
def foo(i):
C1[i] = B[i] + 1
C2[i] = C1[i] + 1
hcl.mutate((10,), lambda i: foo(i), "C")
D = hcl.compute(A.shape, lambda i: C2[i] + 1, "D")
return D
target = hcl.platform.zc706
target.config(compile="vivado_hls", mode="csim")
s = hcl.create_schedule([A], kernel)
s.to([A], target.xcel)
s.to(kernel.D, target.host)
s.to(kernel.B, s[kernel.C1])
s.to(kernel.C2, s[kernel.D])
f = hcl.build(s, target)
np_A = np.zeros((10,))
np_D = np.zeros((10,))
hcl_A = hcl.asarray(np_A)
hcl_D = hcl.asarray(np_D)
f(hcl_A, hcl_D)
The generated code does not capture the C1
and C2
arrays in the foo
function, but capture the zero placeholders.
void test(bit32 A[10], bit32 D[10]) {
bit32 _top;
bit32 B[10];
bit32 B_pipe_1[10];
#pragma HLS dataflow
#pragma HLS stream variable=B_pipe_1 depth=1
B_i: for (bit32 i = 0; i < 10; ++i) {
bit32 B_temp;
B_temp = (A[i] + 1);
B_pipe_1[i] = B_temp;
}
bit32 C1[10];
bit32 C2[10];
bit32 C2_pipe_2[10];
#pragma HLS stream variable=C2_pipe_2 depth=1
C2_i1: for (bit32 i1 = 0; i1 < 10; ++i1) {
bit32 C2_temp;
C2_temp = 0;
C2_pipe_2[i1] = C2_temp;
C2[i1] = C2_temp;
}
bit32 C;
C_i2: for (bit32 i2 = 0; i2 < 10; ++i2) {
C1[i2] = (B[i2] + 1);
C2[i2] = (C1[i2] + 1);
}
D_i3: for (bit32 i3 = 0; i3 < 10; ++i3) {
bit32 C2_temp1;
C2_temp1 = C2_pipe_2[i3];
D[i3] = (C2_temp1 + 1);
}
}
The generated code does not capture the C1 and C2 arrays in the foo function
I do not quite understand what you mean here. The C1 and C2 arrays are both initialized with all zero values, so in the generated code, we just create two allocate statements for them.
The FIFO is not created correctly. You should consider using the following style
# Stage C is the actual consumer of tensor B
s.to(kernel.B, s[kernel.C])
# You want to stream the C2 (the C2 modified after C stage) to the D stage
s.to(kernel.C.C2, s[kernel.D])
I just ran it on the server. And it should be able to generate the code we expected. Please see the test case: https://github.com/Hecmay/heterocl/blob/fix/tests/issues/test_issue_284.py#L29
The FIFO is not created correctly. You should consider using the following style
# Stage C is the actual consumer of tensor B s.to(kernel.B, s[kernel.C]) # You want to stream the C2 (modified after C stage to the D stage) s.to(kernel.C.C2, s[kernel.D])
Yes, it works. Thanks!
I added rapidJSON, a C++ (header-only) library for JSON into HCL codebase. Since the library is not too big, can we just include the source code in our repo? If that is not preferable, we can git clone it. @seanlatias @zhangzhiru
Error message I got on CI:
=================================== FAILURES ===================================
_______________________________ test_tutorial_08 _______________________________
def test_tutorial_08():
> import tutorial_08_backend
tutorials/test_tutorial.py:31:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tutorials/tutorial_08_backend.py:38: in <module>
f(hcl_A, hcl_B)
../.local/lib/python3.6/site-packages/heterocl-0.1-py3.6.egg/heterocl/tvm/_ffi/function.py:128: in __call__
return f(*args)
../.local/lib/python3.6/site-packages/heterocl-0.1-py3.6.egg/heterocl/tvm/_ffi/_ctypes/function.py:183: in __call__
ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ret = -1
def check_call(ret):
"""Check the return value of C API call
This function will raise exception when error occurs.
Wrap every API call with this function
Parameters
----------
ret : int
return value from API calls
"""
if ret != 0:
> raise TVMError(py_str(_LIB.TVMGetLastError()))
E heterocl.tvm._ffi.base.TVMError: [16:40:41] src/codegen/llvm/llvm_module.cc:59: Check failed: ret == 0 (-1 vs. 0) Assert fail: ((((tvm_struct_get(arg1, 0, 5) == (uint8)0) && (tvm_struct_get(arg1, 0, 6) == (uint8)32)) && (tvm_struct_get(arg1, 0, 8) == (uint8)0)) && (tvm_struct_get(arg1, 0, 7) == (uint8)1)), arg1.dtype is expected to be int32
E
E Stack trace returned 10 entries:
E [bt] (0) /home/circleci/.local/lib/python3.6/site-packages/heterocl-0.1-py3.6.egg/lib/libhcl.so(dmlc::StackTrace[abi:cxx11]()+0x40) [0x7ff263dbefa0]
E [bt] (1) /home/circleci/.local/lib/python3.6/site-packages/heterocl-0.1-py3.6.egg/lib/libhcl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x29) [0x7ff263dbf689]
E [bt] (2) /home/circleci/.local/lib/python3.6/site-packages/heterocl-0.1-py3.6.egg/lib/libhcl.so(TVM::codegen::LLVMModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<TVM::runtime::ModuleNode> const&)::{lambda(TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*)#2}::operator()(TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*) const+0x18c) [0x7ff2640b718c]
E [bt] (3) /home/circleci/.local/lib/python3.6/site-packages/heterocl-0.1-py3.6.egg/lib/libhcl.so(std::_Function_handler<void (TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*), TVM::codegen::LLVMModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<TVM::runtime::ModuleNode> const&)::{lambda(TVM::runtime::TVMArgs, TVM::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, TVM::runtime::TVMArgs&&, TVM::runtime::TVMRetValue*&&)+0x17) [0x7ff2640b7267]
E [bt] (4) /home/circleci/.local/lib/python3.6/site-packages/heterocl-0.1-py3.6.egg/lib/libhcl.so(TVMFuncCall+0x4c) [0x7ff26420344c]
However, I was not able to reproduce this error on our local servers...
I think the default FIFO depth should be set as the size of the array, which can guarantee the correctness of an unoptimized dataflow program.
I think the default FIFO depth should be set as the size of the array, which can guarantee the correctness of an unoptimized dataflow program.
Yeah. That's a good suggestion. I can change that. Thanks!
I think the default FIFO depth should be set as the size of the array, which can guarantee the correctness of an unoptimized dataflow program.
What's an unoptimized dataflow program? One that is not actually pipelined? This conservative solution is fine for small tensors. But in general, the area overhead is huge.
I think the default FIFO depth should be set as the size of the array, which can guarantee the correctness of an unoptimized dataflow program.
What's an unoptimized dataflow program? One that is not actually pipelined? This conservative solution is fine for small tensors. But in general, the area overhead is huge.
The program that has not been optimized for area. I mean if users do not specify the FIFO depth, then our default depth should be set large enough. If the user knows what he does, then he can set a smaller depth to consume less area.
I tried to run the KMeans (optimized) design on AWS. It turned out the design synthesized by Vitis HLS has a bit worse performance than Vivado HLS (default HLS tool in Vitis 2019.02). I think I need to fall back to Vitis 2019 on AWS and try again.
@Hecmay #259 is merged.
@whbldhwj JIe. Please see the latest example here: https://github.com/Hecmay/heterocl/blob/fix/tests/test_schedule_systolic.py#L6-L31
The data type has been updated -- we do not have those nested casting any more for AutoSA module. And we also have the header and main body section for the imported IP.
Moved to #316
resue_at
(by adding an IR pass to fix the buffer binding issues). Should fix #264, fix #230 and fix #219 and fix #154s.subgraph()
. This is needed for dataflow support #245test_runtime_build
andtest_schedule_stream
hls::stream
to implement FIFOs in Vivado HLS backend (NB, this is less stable than pragma annotated multi-dimensional array, based on what we observed in Vivado HLS 2019.2). Should fix #286Other new features
HCL_DEBUG_LEVEL(level)
MACRO (enabled when HCL_DEBUG_ON is set)s.to(tensor, p.xcel.BRAM)
). Should fix #281Error in python: double free or corruption (!prev)
Fixed bugs
loop.1
toloop_1
)