cornell-zhang / hcl-dialect

HeteroCL-MLIR dialect for accelerator design
https://cornell-zhang.github.io/heterocl/index.html
Other
40 stars 17 forks source link

[Verification Failure] `affine.for` op operand cannot be used as a dimension id #70

Closed zzzDavid closed 2 years ago

zzzDavid commented 2 years ago

I'm trying to generate IR for VTA. So far the whole IR can be generated but it failed verification. I'm trying to separate which part caused this error. I will provide more information when I can reproduce this error on a smaller example.

The error message is:

error: `affine.for` op operand cannot be used as a dimension id
// Verification failed, printing generic form

@chhzh123 Have you seen anything like this?

chhzh123 commented 2 years ago

Similar to #10. Maybe your expression is not affine. Should use scf.for.

chhzh123 commented 2 years ago

Using hcl-opt to run the generated generic MLIR program can help you locate the problem.

zzzDavid commented 2 years ago

The case that triggers this error is where we have a two-level loop, the inner loop's bound might change as the outer loop iterates.

e.g.

#map1 = affine_map<(d0) -> (d0)>
module{
func @affine_for_lower_bound_invalid_dim(%arg : index, %arg1 : i32) {
  affine.for %n0 = 0 to 7 {
    %cst_1 = arith.constant 1 : index 
    %dim = arith.addi %arg, %cst_1 : index
    affine.for %n1 = 0 to #map1(%dim) {
    }
  }
  return
}
}

This is quite common for VTA, where lots of loop bounds are values from the instruction. A minimal example is:

import heterocl as hcl
def kernel(instr):
    with hcl.for_(0, 10, tag='i') as i:
        x_size = hcl.scalar(instr[i][5:10], name="y_size")
        def fcompute(i):
            pass
        hcl.mutate((x_size.v,), fcompute, 'compute')

instr = hcl.placeholder((10,))
s = hcl.create_schedule([instr], kernel)
print(hcl.lower(s))

The output:

error: 'affine.for' op operand cannot be used as a dimension id
// Verification failed, printing generic form
#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<() -> (0)>
#map2 = affine_map<() -> (10)>
"builtin.module"() ({
  "builtin.func"() ({
  ^bb0(%arg0: memref<10xi32>):
    "affine.for"() ({
    ^bb0(%arg1: index):
      %0 = "affine.load"(%arg0, %arg1) {from = "compute_0", map = #map0} : (memref<10xi32>, index) -> i32
      %1 = "arith.constant"() {value = 9 : index} : () -> index
      %2 = "arith.constant"() {value = 5 : index} : () -> index
      %3 = "hcl.get_slice"(%0, %1, %2) : (i32, index, index) -> i32
      %4 = "memref.alloc"() {name = "y_size", operand_segment_sizes = dense<0> : vector<2xi32>} : () -> memref<1xi32>
      %5 = "arith.constant"() {value = 0 : index} : () -> index
      "affine.store"(%3, %4) {map = #map1, to = "y_size"} : (i32, memref<1xi32>) -> ()
      %6 = "arith.constant"() {value = 0 : index} : () -> index
      %7 = "affine.load"(%4) {from = "y_size", map = #map1} : (memref<1xi32>) -> i32
      %8 = "hcl.create_loop_handle"() {loop_name = "i"} : () -> !hcl.LoopHandle
      %9 = "arith.index_cast"(%7) : (i32) -> index
      "affine.for"(%9) ({
      ^bb0(%arg2: index):
        "affine.yield"() : () -> ()
      }) {loop_name = "i", lower_bound = #map1, stage_name = "compute", step = 1 : i32, upper_bound = #map0} : (index) -> ()
      %10 = "hcl.create_stage_handle"() {stage_name = "compute"} : () -> !hcl.StageHandle
      %11 = "memref.alloc"() {name = "compute", operand_segment_sizes = dense<0> : vector<2xi32>} : () -> memref<1xi32>
      "affine.yield"() : () -> ()
    }) {loop_name = "loop_0", lower_bound = #map1, stage_name = "i", step = 1 : i32, upper_bound = #map2} : () -> ()
    "std.return"() : () -> ()
  }) {bit, extra_itypes = "s", extra_otypes = "", sym_name = "top", type = (memref<10xi32>) -> ()} : () -> ()
}) : () -> ()
zzzDavid commented 2 years ago

We should to use scf.for for such case

chhzh123 commented 2 years ago

Basically we prefer affine.for over scf.for since affine structure is easier to optimize. We should first check whether the loop bounds are affine or not, then select the correct implementation. You can check the make_if function in build_ir.py, which provides systematic approach to tackle these cases. The idea is to build the expression in general SSA form first and traverse back to see whether it is affine. If it is affine, roll back and remove the built expressions, and then reconstruct the affine one using affine map.