cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
326 stars 92 forks source link

Wrong CPU simulation result with HeteroCL module #148

Closed hecmay closed 4 years ago

hecmay commented 4 years ago

I created a HCl reducer to realize a sorting algorithm, and wrap it with a HeteroCL module. The gaol is to sort a 10x3 array in axis 1. I checked the IR line by line and did not find any wrong logic inside. But the simulation result is not correct in each row.

The Halide IR:

// attr [sort] storage_scope = "global"
produce sort {
  // attr [0] extern_scope = 0
  def sort(handle64(sort.A[10*3]), handle64(sort.B[10*3])) {
    allocate compute0[int32 * 3]
    for (x, 0, 3) {
      compute0[x] = 1000
    }
    for (x, 0, 10) {
      allocate reducer0[int32 * 3]
      for (args, 0, 3) {
        reducer0[args] = compute0[args]
      }
      for (rdx, 0, 3) {
        for (i, 0, 3) {
          if ((sort.A[(rdx + (x*3))] < reducer0[i])) {
            for (i, 0, ((i + -2)/-1)) {
              reducer0[(2 - i)] = reducer0[(1 - i)]
            }
          }
          reducer0[i] = sort.A[(rdx + (x*3))]
          break
        }
      }
      for (reducer0_i0, 0, 3) {
        sort.B[(reducer0_i0 + (x*3))] = reducer0[reducer0_i0]
      }
    }
  }
}
// attr [sort0] storage_scope = "global"
allocate sort0[int32 * 1]
produce sort0 {
  // attr [0] extern_scope = 0
  sort(A, B)
}

the input array:

[[14 12  6]
 [ 3  7 18]
 [16  2 11]
 [17 15  7]
 [ 4 16  8]
 [ 0 15  7]
 [ 0  0  6]
 [13  1  0]
 [10  6  6]
 [19 19 11]]

And the sorted result is wrong for most rows (for some rows the result is correct). I guess there might be something wrong with memory in LLVM JIT complication?

[[   6   12   14]
 [  18 1000 1000]
 [  11   16 1000]
 [   7   15   17]
 [   8   16 1000]
 [   7   15 1000]
 [   6 1000 1000]
 [   0    1   13]
 [   6   10 1000]
 [  11   19 1000]]
seanlatias commented 4 years ago

What's your HeteroCL code? The IR also looks incorrect to me. The break statement should be inside the if statement?

hecmay commented 4 years ago

What's your HeteroCL code? The IR also looks incorrect to me. The break statement should be inside the if statement?

You are right. I forgot to check the break statement . It should break right after finding the insertion point. The current code looks like this:

    def kernel(A, B):
        hlib.function.sort(A, B)

    A = hcl.placeholder((10, 3), name="A")
    B = hcl.placeholder((10, 3), name="B")
    s = hcl.create_schedule([A, B], kernel)

The implementation is in hlib.function

def sort(a, b, axis=1, init_value=1e3, name="sort"):
    """ sort in axis with ascending order """
    assert axis<len(a.shape) and len(a.shape)<=2, "invalid axis"
    assert a.shape == b.shape, "shape mismatch"
    size = a.shape[axis] # 2d insert sorting

    def sreduce(x, Y):
        with hcl.for_(0, size, name="i") as i:
          with hcl.if_(x < Y[i]):
            with hcl.for_(size-1,i,-1) as j:
              Y[j] = Y[j-1]
            Y[i] = x
            hcl.break_()

    def sort2d(A, B):
        init = hcl.compute((size,), lambda x: init_value)
        my_sort = hcl.reducer(init, sreduce)
        r = hcl.reduce_axis(0, size, name="rdx")
        if axis == 1:
          hcl.update(B, lambda x, _y:
              my_sort(A[x, r], axis=r), name=name)
        else: # sort in y axis
          hcl.update(B, lambda _x, y:
              my_sort(A[r, y], axis=r), name=name)

    # return decorated function
    module = hcl.def_([a.shape, b.shape], name=name)(sort2d)
    module(a, b)