cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
326 stars 92 forks source link

`reuse_at` returns wrong result with `hcl.select` #155

Open hecmay opened 4 years ago

hecmay commented 4 years ago

Here I want to reuse on one input in two different dimensions:

def test_reuse_select():
    hcl.init()
    A = hcl.placeholder((10, 10, 2))
    B = hcl.compute((10, 8, 2), lambda y, x, c:
            hcl.select(c==0, A[y, x, c]*1 + A[y, x+1, c]*1 + A[y, x+2, c]*1,
                             A[y, x, c]*3 + A[y, x+1, c]*5 + A[y, x+2, c]*6))
    s = hcl.create_schedule([A, B])
    RB = s.reuse_at(A, s[B], B.axis[1])
    f = hcl.build(s)

    np_A = np.random.randint(0, 10, size=(10, 10, 2))
    np_B = np.zeros((10, 8, 2), dtype="int")
    np_C = np.zeros((10, 8, 2), dtype="int")

    for y in range(0, 10):
        for x in range(0, 8):
            np_C[y][x][0] = np_A[y][x][0]*1 + np_A[y][x+1][0]*1 + np_A[y][x+2][0]*1
            np_C[y][x][1] = np_A[y][x][1]*3 + np_A[y][x+1][1]*5 + np_A[y][x+2][1]*6

    hcl_A = hcl.asarray(np_A)
    hcl_B = hcl.asarray(np_B)
    print(hcl.lower(s))

    f(hcl_A, hcl_B)

    np_B = hcl_B.asnumpy()

    assert np.array_equal(np_B, np_C)

The result does not match with the ground truth when reuse_at schedule is applied. The IR is as followed:

produce compute0 {
  // attr [placeholder0.reuse] storage_scope = "global"
  allocate placeholder0.reuse[int32 * 1 * 3 * 2]
  // attr [0] extern_scope = 0
  for (y, 0, 10) {
    for (x.reuse, 0, 10) {
      for (c, 0, 2) {
        produce placeholder0.reuse {
          for (placeholder0.1, 0, 2) {
            placeholder0.reuse[(c + (placeholder0.1*3))] = placeholder0.reuse[((c + (placeholder0.1*3)) + 3)]
          }
          placeholder0.reuse[(c + 6)] = placeholder0[((c + (x.reuse*10)) + (y*20))]
        }
        if ((2 <= x.reuse)) {
          compute0[(((c + (x.reuse*2)) + (y*16)) + -4)] = int32(tvm_if_then_else((c == 0), (int34((int33(placeholder0.reuse[c]) + int33(placeholder0.reuse[(c + 3)]))) + int34(placeholder0.reuse[(c + 6)])), (int34((int33((placeholder0.reuse[c]*3)) + int33((placeholder0.reuse[(c + 3)]*5)))) + int34((placeholder0.reuse[(c + 6)]*6)))))
        }
      }
    }
  }
}

I am also wondering if it is possible to modify the reuse_at primitive and make it work on HeteroCL module. Generating reuse buffer along with .to() takes a lot of efforts (I need to figure out which dimension to exploit reusability, and also to do a lot of repeat work as reuse_at), and I am afraid that it cannot be done very soon. Reusing the reuse_at primitive and making it compatible with HeteroCL might be a better idea.

hecmay commented 4 years ago

Another issue when applying reuse_at to HeteroCL module. the issue was caused by the index recovery function in the IR pass function. The index recovery function works fine in common cases.

for (c, 0, 10)
  for (rdx, 0, 3)
    reducer += b[c*3 + rdx] // recovered index : b[c, rdx]

However, it failed to recover the accurate index with an addition if-then-else statement:

for (c, 0, 10)
  for (rdx, 0, 3)
    if (rdx < 2)
      reducer += b[c*3 + rdx + 1] // recovered index failure

I added a quick fix in the generate_reuse_buffer IR pass to update the range of iteration variable. The index can be recovered correctly and reuse can be inserted as expected.

seanlatias commented 4 years ago

Please check #156

hecmay commented 4 years ago

Thanks! I will merge it. By the way I cannot merge form v0.3 branch. There is SegFault when importing hlib from v0.3.