cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

Vectorization IR Modification Proposal #432

Open AlgaPeng opened 2 years ago

AlgaPeng commented 2 years ago

If the variable to be vectorized is used elsewhere in the program, we faced an inconsistent type issue. For example, the following test case:

def vec_auto_3D():
  target = hcl.Platform.aws_f1
  A = hcl.placeholder((2,3,4),'A')
  B = hcl.compute(A.shape, lambda x,y,z: A[x,y,z], name="B")
  C = hcl.compute(A.shape, lambda x,y,z: B[x,y,z]+1, name="C")
  s = hcl.create_schedule([A,B,C])
  s[C].vectorize(C.axis[2])
  f = hcl.build(s, target = "vhls")

would generate IR of:

produce B {
  // attr [0] extern_scope = 0
  for "stage_name"="B" (x, 0, 2) {
    for "stage_name"="B" (y, 0, 3) {
      for "stage_name"="B" (z, 0, 4) {
        B[((z + (y*4)) + (x*12))] = A[((z + (y*4)) + (x*12))]
      }
    }
  }
}
produce C {
  // attr [0] extern_scope = 0
  for "stage_name"="C" (x, 0, 2) {
    for "stage_name"="C" (y, 0, 3) {
      C[ramp(((y + (x*3))*4), 1, 4)] = (B[ramp(((y + (x*3))*4), 1, 4)] + x4(1))
    }
  }
}

In this case the type for B is not consistent throughout the program. To deal with it we have to modify the IR. We propose two different ways of making it consistent:

  1. We vectorize vector C but not B and generate a warning to inform the user that although the target variable is vectorized, it does not take advantages of SIMD
    produce B {
    // attr [0] extern_scope = 0
    for "stage_name"="B" (x, 0, 2) {
    for "stage_name"="B" (y, 0, 3) {
      for "stage_name"="B" (z, 0, 4) {
        B[((z + (y*4)) + (x*12))] = A[((z + (y*4)) + (x*12))]
      }
    }
    }
    }
    produce C {
    // attr [0] extern_scope = 0
    for "stage_name"="C" (x, 0, 2) {
    for "stage_name"="C" (y, 0, 3) {
      for "stage_name"="C" (z, 0, 4) {
         C[ramp(((y + (x*3))*4), 1, 4)][z] = (B[((z + (y*4)) + (x*12))]+ x4(1))
      }
    }
    }
    }
  2. We vectorize every variable in the loop that is marked as vectorized
    produce B {
    // attr [0] extern_scope = 0
    for "stage_name"="B" (x, 0, 2) {
    for "stage_name"="B" (y, 0, 3) {
      for "stage_name"="B" (z, 0, 4) {
        B[ramp(((y + (x*3))*4), 1, 4)][z] = A[((z + (y*4)) + (x*12))]
      }
    }
    }
    }
    produce C {
    // attr [0] extern_scope = 0
    for "stage_name"="C" (x, 0, 2) {
    for "stage_name"="C" (y, 0, 3) {
      C[ramp(((y + (x*3))*4), 1, 4)] = (B[ramp(((y + (x*3))*4), 1, 4)] + x4(1))
    }
    }
    }

    We want some suggestions on which is the more preferred way to implement this. Thank you! @zhangzhiru @Hecmay