Closed Huyuwei closed 6 years ago
@seanlatias There is an issue now: tvm's reorder schedule in some cases doesn't work for ExternOp. The solution is that I only keep the relation node and commented out all other code. Is it ok to do this? Is the schedule interface for ComputeOp already broken?
@Huyuwei can you provide more detail on what errors did you run into with ExternOp?
For your question, I cannot understand. Can you give me an example that causes the error? What do you mean by "TVM's reorder schedule"? Is that different from our reorder schedule? We are no longer using TVM's ComputeOp anymore so it does not matter if it is broken or not. We will finally remove all ComputeOp related stuff.
The error happens here https://github.com/cornell-zhang/heterocl/blob/master/tvm/src/schedule/schedule_lang.cc#L295
temp2.emplace_back(self->iter_var_exprs[pos[i]]);
Using the tiling example above, after two splits, the size of pos is 4 (i.outer, i.inner, j.outer, j.inner), but the size of iter_var_exprs is 2 (i, j), so segmentation fault happens.
This is because you did not add the corresponding iter_var_exprs
when performing splitting. This is where you should add those.
You can check my previous commits (such as Fuse and Reorder) to see how I added them.
The Fuse example can be seen here.
@seanlatias Thanks, the bug caused by iter_var_exprs is fixed now.
Condition statement (if likely (...)
) will be inserted when the split factor is not a divisor of the loop bound. I will let you know when I fix this part.
We don't need the "likely" for now. Just adding an IF statement is enough.
Now IF stmt is inserted. Above split and reorder example will generate ir:
produce c {
// attr [0] extern_scope = 0
for (i.outer, 0, 4) {
for (j.outer, 0, 7) {
for (i.inner, 0, 3) {
for (j.inner, 0, 3) {
if (((i.outer*3) < (10 - i.inner))) {
if (((j.outer*3) < (20 - j.inner))) {
c[(((j.outer*3) + j.inner) + (((i.outer*3) + i.inner)*20))] = int32((int34(a[(((j.outer*3) + j.inner) + (((i.outer*3) + i.inner)*20))]) + int34(b[(((j.outer*3) + j.inner) + (((i.outer*3) + i.inner)*20))])))
}
}
}
}
}
}
}
See https://github.com/Huyuwei/heterocl/blob/schedule/heterocl/tests/test_schedule.py#L70-L86
@seanlatias The issue of IF stmt position is fixed. Now consider the code:
def popcount(A, B): # each element in A is a 32-bit integer
with hcl.for_(0, A.shape[0], name="x") as x:
with hcl.for_(0, A.shape[1], name="y") as y:
B[x, y] = 0
with hcl.for_(0, 32) as i:
B[x, y] += A[x, y][i]
A = hcl.placeholder((10, 20))
B = hcl.placeholder(A.shape)
with hcl.stage() as C:
popcount(A, B)
s = hcl.create_schedule(C)
s[C].split(C.x, factor=3)
ir = hcl.lower(s, [A, B])
It generates IR:
// attr [stage2] storage_scope = "global"
allocate stage2[int32 * 1]
produce stage2 {
// attr [0] extern_scope = 0
for (x.outer, 0, 4) {
for (x.inner, 0, 3) {
if (((x.outer*3) < (10 - x.inner))) {
for (y, 0, 20) {
placeholder1[(y + (((x.outer*3) + x.inner)*20))] = 0
for (i, 0, 32) {
placeholder1[(y + (((x.outer*3) + x.inner)*20))] = int32((int34(placeholder1[(y + (((x.outer*3) + x.inner)*20))]) + int34(placeholder0[(y + (((x.outer*3) + x.inner)*20))][i])))
}
}
}
}
}
}
This is covered in test case: https://github.com/Huyuwei/heterocl/blob/schedule/heterocl/tests/test_imperative_code.py#L36-L45
@Huyuwei Can you briefly describe what you have changed in this update? Also, what's the purpose for "CollectIfCondition"?
Now each IF stmt is associated with a For loop. When we reorder several For loops, associated IF stmt will also be reordered.
In the following code, condition_a is associated with i, and condition_b is associated with j.
for (i, 0, 10):
if (condition_a):
for (j, 0, 20):
if(condition_b):
body ...
If we reorder i and j, we will get:
for (j, 0, 20):
if (condition_b):
for (i, 0, 10):
if(condition_a):
body ...
CollectIfConditions is to build a map with loop_var as the key and condition as the value.
I don't think this is correct. You cannot associate each IF condition with a FOR loop. Only the IF conditions generated by the SPLIT schedule will be associated with a FOR loop. Again, we need to think the most complicated case, where the users can have lots of imperative statements inside a FOR loop. It is normal to write an IF condition inside the FOR loop. With your current algorithm, the conditions written by the users will be forced to move with loop reordering. Although it is not always safe to interchange loops, it is not our duty to check.
You also remove the check for whether the reorder can be performed or not. You need to add that back.
@seanlatias Two updates:
Now CollectIfConditions collect only IF stmt generated by the split schedule
IF stmt is ensured to be below both inner and outer for loops after reorder.
Consider the code:
a = hcl.placeholder((10, 20), name="a")
b = hcl.placeholder((10, 20), name="b")
c = hcl.compute(a.shape, lambda i, j: a[i, j] + b[i, j], name="c")
s = hcl.create_schedule(c)
yo, yi = s[c].split(c.axis[0], factor=3, mode="transform")
xo, xi = s[c].split(c.axis[1], factor=3, mode="transform")
s[c].reorder(yi, xi, yo, xo)
ir = hcl.lower(s, [a, b, c])
The IR is:
produce c {
// attr [0] extern_scope = 0
for (i.inner, 0, 3) {
for (j.inner, 0, 3) {
for (i.outer, 0, 4) {
if (((i.outer*3) < (10 - i.inner))) {
for (j.outer, 0, 7) {
if (((j.outer*3) < (20 - j.inner))) {
c[(((j.outer*3) + j.inner) + (((i.outer*3) + i.inner)*20))] = int32((int34(a[(((j.outer*3) + j.inner) + (((i.outer*3) + i.inner)*20))]) + int34(b[(((j.outer*3) + j.inner) + (((i.outer*3) + i.inner)*20))])))
}
}
}
}
}
}
}
@seanlatias I just realized that in reorder, the axes can be non-consecutive. Axes not in the reorder list will stay the same position. See the test case: https://github.com/cornell-zhang/heterocl/blob/80cb5f434622c63ec1d3687dd1424090559f7f8e/heterocl/tests/test_schedule.py#L53-L61
This PR adds split schedule primitive. It has two modes:
transform
changes the IR structure (nested loops), the same as in tvmannotate
adds attributes, without changing the IR structureTransform mode example
python code:
generated IR:
Annotate mode example
python code:
generated IR:
Do tiling with split and reorder
Now we can use split and reorder to do tiling:
generated IR:
Can add a wrapper to directly support tiling as a schedule primitive.