Open chhzh123 opened 4 years ago
This should be fixed with #206, which we will merge soon. Thanks for filing the issue.
The error is caused by the IR pass lift_allocate_attrs
creating an invalid block statement. But the root cause occurs before that.
The bug has been fixed in #206. I am able to run your example. But it is not fully as expected... I can still see an allocate statement for the partitioned buffer:
// attr [_top] storage_scope = "global"
allocate _top[int32 * 1]
produce _top {
// attr [0] extern_scope = 0
// attr [B] storage_scope = "global"
allocate B[int32 * 10 * 10]
array partition variable=B complete factor=0 dim=0
produce B {
// attr [0] extern_scope = 0
// attr [B.partitioned] storage_scope = "global"
allocate B.partitioned[int32 * 1]
for (x, 0, 10) {
for (y, 0, 10) {
B[(y + (x*10))] = (A[(y + (x*10))] + 1)
}
}
}
produce C {
// attr [0] extern_scope = 0
for (x, 0, 10) {
for (y, 0, 10) {
C[(y + (x*10))] = (B[(y + (x*10))] + 1)
}
}
}
}
Here is the test case I added before: https://github.com/Hecmay/heterocl/blob/stream_to/tests/test_schedule_stream.py#L531
Yes, it's weird... I see many xxx_partitioned
variables before each loop.
I need to look into this pass: https://github.com/cornell-zhang/heterocl/blob/master/tvm/src/pass/lift_allocate_attrs.cc
It seems that some assumptions are not satisfied...
A simple example is shown below (modified from the tutorial),
which causes Runtime Error.
It's able to run when partitioning the
A
array or theC
array, thus only inner tensors cause the problem. I have also trieds.partition(kernel.B._op)
, but it cannot work either.