Open chhzh123 opened 4 years ago
Can you show your use case? I'm thinking maybe I wrote this for some specific reasons. I guess Simplify
is used in many places and sometimes we don't want to lose the attribute information.
I'm wondering what kind of HLS error you are facing. A loopcount=1 loop shouldn't cause any error in HLS (?)
I sort of know why I did this. It was because we wanted to connect our flow with PolySA before. Actually, the attributes in hlib might not be that useful now.
Can you show your use case? I'm thinking maybe I wrote this for some specific reasons. I guess
Simplify
is used in many places and sometimes we don't want to lose the attribute information.
This case is somehow tricky, but it indeed causes the error. See the code below.
def test_dataflow():
A = hcl.placeholder((1,10), "A")
def kernel(A):
B = hcl.compute(A.shape,
lambda i, j: A[i, j] + 1, "B", attrs=OrderedDict([('app',tvm.make.StringImm('B'))]))
C = hcl.compute(B.shape,
lambda i, j: B[i, j] + 1, "C", attrs=OrderedDict([('app',tvm.make.StringImm('C'))]))
D = hcl.compute(C.shape,
lambda i, j: C[i, j] + 1, "D", attrs=OrderedDict([('app',tvm.make.StringImm('D'))]))
return D
target = hcl.platform.zc706
target.config(compile="vivado_hls", mode="csyn")
s = hcl.create_schedule([A], kernel)
s.to([A], target.xcel)
s.to(kernel.D, target.host)
s.to(kernel.B, s[kernel.C])
s.to(kernel.C, s[kernel.D])
f = hcl.build(s, target)
np_A = np.zeros((1,10))
np_D = np.zeros((1,10))
hcl_A = hcl.asarray(np_A)
hcl_D = hcl.asarray(np_D)
f(hcl_A, hcl_D)
attr
s are attached to hcl.compute
, thus loop with trip count 1 cannot be eliminated.
void test(bit32 A[1][10], bit32 D[1][10]) {
bit32 B_pipe_1[1][10];
#pragma HLS stream variable=B_pipe_1 depth=1
#pragma HLS dataflow
B_i: for (bit32 i = 0; i < 1; ++i) {
B_j: for (bit32 j = 0; j < 10; ++j) {
bit32 B_temp;
B_temp = (A[i][j] + 1);
B_pipe_1[i][j] = B_temp;
}
}
bit32 C_pipe_2[1][10];
#pragma HLS stream variable=C_pipe_2 depth=2
C_i1: for (bit32 i1 = 0; i1 < 1; ++i1) {
C_j1: for (bit32 j1 = 0; j1 < 10; ++j1) {
bit32 B_temp1;
B_temp1 = B_pipe_1[i1][j1];
bit32 C_temp;
C_temp = (B_temp1 + 1);
C_pipe_2[i1][j1] = C_temp;
}
}
D_i2: for (bit32 i2 = 0; i2 < 1; ++i2) {
D_j2: for (bit32 j2 = 0; j2 < 10; ++j2) {
bit32 C_temp1;
C_temp1 = C_pipe_2[i2][j2];
D[i2][j2] = (C_temp1 + 1);
}
}
}
Then, when this piece of code passes to Vivado HLS, the loops will be automatically unrolled. After that, Vivado HLS cannot distinguish different stages (only one function Block_codeRepl8_proc7
is detected here), causing synthesis error.
INFO: [XFORM 203-502] Unrolling small iteration loop 'B_i' (kernel.cpp:16) in function 'test' automatically.
INFO: [XFORM 203-502] Unrolling small iteration loop 'C_i1' (kernel.cpp:25) in function 'test' automatically.
INFO: [XFORM 203-502] Unrolling small iteration loop 'D_i2' (kernel.cpp:34) in function 'test' automatically.
INFO: [XFORM 203-501] Unrolling loop 'B_i' (kernel.cpp:16) in function 'test' completely.
INFO: [XFORM 203-501] Unrolling loop 'C_i1' (kernel.cpp:25) in function 'test' completely.
INFO: [XFORM 203-501] Unrolling loop 'D_i2' (kernel.cpp:34) in function 'test' completely.
INFO: [XFORM 203-712] Applying dataflow to function 'test', detected/extracted 1 process function(s):
'Block_codeRepl8_proc7'.
ERROR: [XFORM 203-123] Cannot stream 'C_pipe_2.V2': a local variable is streamable only if it is in a dataflow region.
ERROR: [HLS 200-70] Pre-synthesis failed.
@chhzh123, so if the loops are eliminated, they can work? I'm wondering if the HLS tool will unroll loop with not only trip count = 1 but maybe like trip count = 2. If that's the case, it will be more like an HLS bug.
If the loops are eliminated, they can work?
Yes, it can work. HLS can detect three functions here.
I just tried trip count=2 and it works. So I guess the special case when trip count=1 will cause the problem. @zhangzhiru do you think this is an HLS bug? Although we can definitely remove the loops with trip count=1 by ourselves but that shouldn't be necessary.
Maybe the best way is to generate modules/functions to explicitly distinguish different stages.
@chhzh123 please go ahead and remove that logic in your code for now. I'll need to double-check to see if that logic is indeed needed.
Some loops with trip count one cannot be eliminated by current simplification logic (see below), when
hcl.compute
is accompanied by theattr
argument, which is very common in currenthlib
implementation. https://github.com/cornell-zhang/heterocl/blob/d3173471e877c32fd9327e882575499c46f10f69/tvm/HalideIR/src/arithmetic/Simplify.cpp#L4795-L4798It may cause errors when Vivado HLS automatically unrolls the loops and blurs the boundary of the dataflow region.