Vectorization IR Modification Proposal

If the variable to be vectorized is used elsewhere in the program, we faced an inconsistent type issue. For example, the following test case:

def vec_auto_3D():
  target = hcl.Platform.aws_f1
  A = hcl.placeholder((2,3,4),'A')
  B = hcl.compute(A.shape, lambda x,y,z: A[x,y,z], name="B")
  C = hcl.compute(A.shape, lambda x,y,z: B[x,y,z]+1, name="C")
  s = hcl.create_schedule([A,B,C])
  s[C].vectorize(C.axis[2])
  f = hcl.build(s, target = "vhls")

would generate IR of:

produce B {
  // attr [0] extern_scope = 0
  for "stage_name"="B" (x, 0, 2) {
    for "stage_name"="B" (y, 0, 3) {
      for "stage_name"="B" (z, 0, 4) {
        B[((z + (y*4)) + (x*12))] = A[((z + (y*4)) + (x*12))]
      }
    }
  }
}
produce C {
  // attr [0] extern_scope = 0
  for "stage_name"="C" (x, 0, 2) {
    for "stage_name"="C" (y, 0, 3) {
      C[ramp(((y + (x*3))*4), 1, 4)] = (B[ramp(((y + (x*3))*4), 1, 4)] + x4(1))
    }
  }
}

In this case the type for B is not consistent throughout the program. To deal with it we have to modify the IR. We propose two different ways of making it consistent:

We vectorize vector C but not B and generate a warning to inform the user that although the target variable is vectorized, it does not take advantages of SIMD

produce B {
// attr [0] extern_scope = 0
for "stage_name"="B" (x, 0, 2) {
for "stage_name"="B" (y, 0, 3) {
  for "stage_name"="B" (z, 0, 4) {
    B[((z + (y*4)) + (x*12))] = A[((z + (y*4)) + (x*12))]
  }
}
}
}
produce C {
// attr [0] extern_scope = 0
for "stage_name"="C" (x, 0, 2) {
for "stage_name"="C" (y, 0, 3) {
  for "stage_name"="C" (z, 0, 4) {
     C[ramp(((y + (x*3))*4), 1, 4)][z] = (B[((z + (y*4)) + (x*12))]+ x4(1))
  }
}
}
}

We vectorize every variable in the loop that is marked as vectorized

produce B {
// attr [0] extern_scope = 0
for "stage_name"="B" (x, 0, 2) {
for "stage_name"="B" (y, 0, 3) {
  for "stage_name"="B" (z, 0, 4) {
    B[ramp(((y + (x*3))*4), 1, 4)][z] = A[((z + (y*4)) + (x*12))]
  }
}
}
}
produce C {
// attr [0] extern_scope = 0
for "stage_name"="C" (x, 0, 2) {
for "stage_name"="C" (y, 0, 3) {
  C[ramp(((y + (x*3))*4), 1, 4)] = (B[ramp(((y + (x*3))*4), 1, 4)] + x4(1))
}
}
}

We want some suggestions on which is the more preferred way to implement this. Thank you! @zhangzhiru @Hecmay

cornell-zhang / heterocl

Vectorization IR Modification Proposal #432