Closed zzzDavid closed 2 years ago
Similar to #10. Maybe your expression is not affine. Should use scf.for
.
Using hcl-opt
to run the generated generic MLIR program can help you locate the problem.
The case that triggers this error is where we have a two-level loop, the inner loop's bound might change as the outer loop iterates.
e.g.
#map1 = affine_map<(d0) -> (d0)>
module{
func @affine_for_lower_bound_invalid_dim(%arg : index, %arg1 : i32) {
affine.for %n0 = 0 to 7 {
%cst_1 = arith.constant 1 : index
%dim = arith.addi %arg, %cst_1 : index
affine.for %n1 = 0 to #map1(%dim) {
}
}
return
}
}
This is quite common for VTA, where lots of loop bounds are values from the instruction. A minimal example is:
import heterocl as hcl
def kernel(instr):
with hcl.for_(0, 10, tag='i') as i:
x_size = hcl.scalar(instr[i][5:10], name="y_size")
def fcompute(i):
pass
hcl.mutate((x_size.v,), fcompute, 'compute')
instr = hcl.placeholder((10,))
s = hcl.create_schedule([instr], kernel)
print(hcl.lower(s))
The output:
error: 'affine.for' op operand cannot be used as a dimension id
// Verification failed, printing generic form
#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<() -> (0)>
#map2 = affine_map<() -> (10)>
"builtin.module"() ({
"builtin.func"() ({
^bb0(%arg0: memref<10xi32>):
"affine.for"() ({
^bb0(%arg1: index):
%0 = "affine.load"(%arg0, %arg1) {from = "compute_0", map = #map0} : (memref<10xi32>, index) -> i32
%1 = "arith.constant"() {value = 9 : index} : () -> index
%2 = "arith.constant"() {value = 5 : index} : () -> index
%3 = "hcl.get_slice"(%0, %1, %2) : (i32, index, index) -> i32
%4 = "memref.alloc"() {name = "y_size", operand_segment_sizes = dense<0> : vector<2xi32>} : () -> memref<1xi32>
%5 = "arith.constant"() {value = 0 : index} : () -> index
"affine.store"(%3, %4) {map = #map1, to = "y_size"} : (i32, memref<1xi32>) -> ()
%6 = "arith.constant"() {value = 0 : index} : () -> index
%7 = "affine.load"(%4) {from = "y_size", map = #map1} : (memref<1xi32>) -> i32
%8 = "hcl.create_loop_handle"() {loop_name = "i"} : () -> !hcl.LoopHandle
%9 = "arith.index_cast"(%7) : (i32) -> index
"affine.for"(%9) ({
^bb0(%arg2: index):
"affine.yield"() : () -> ()
}) {loop_name = "i", lower_bound = #map1, stage_name = "compute", step = 1 : i32, upper_bound = #map0} : (index) -> ()
%10 = "hcl.create_stage_handle"() {stage_name = "compute"} : () -> !hcl.StageHandle
%11 = "memref.alloc"() {name = "compute", operand_segment_sizes = dense<0> : vector<2xi32>} : () -> memref<1xi32>
"affine.yield"() : () -> ()
}) {loop_name = "loop_0", lower_bound = #map1, stage_name = "i", step = 1 : i32, upper_bound = #map2} : () -> ()
"std.return"() : () -> ()
}) {bit, extra_itypes = "s", extra_otypes = "", sym_name = "top", type = (memref<10xi32>) -> ()} : () -> ()
}) : () -> ()
We should to use scf.for
for such case
Basically we prefer affine.for over scf.for since affine structure is easier to optimize. We should first check whether the loop bounds are affine or not, then select the correct implementation. You can check the make_if
function in build_ir.py, which provides systematic approach to tackle these cases. The idea is to build the expression in general SSA form first and traverse back to see whether it is affine. If it is affine, roll back and remove the built expressions, and then reconstruct the affine one using affine map.
I'm trying to generate IR for VTA. So far the whole IR can be generated but it failed verification. I'm trying to separate which part caused this error. I will provide more information when I can reproduce this error on a smaller example.
The error message is:
@chhzh123 Have you seen anything like this?