cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
326 stars 92 forks source link

Weird array indices generation #276

Open chhzh123 opened 4 years ago

chhzh123 commented 4 years ago

I'm not sure why the access pattern in the generated VHLS code of the following example becomes so complex. Seems it only happens when the output array is larger the input array.

def test_complex():
    dtype = hcl.Float()
    A = hcl.placeholder((1,1,8,8),"A",dtype)
    def kernel(A):
        return hcl.compute((1,1,10,10), lambda i, c, x, y: hcl.select(tvm.all(x < 8, y < 8),A[i, c, x, y],0), "B", dtype)
    s = hcl.create_schedule([A], kernel)
    target = hcl.platform.zc706
    target.config(compile="vivado_hls",mode="csim")
    f = hcl.build(s, target=target)
    hcl_A = hcl.asarray(np.zeros((1,1,8,8)))
    hcl_B = hcl.asarray(np.zeros((1,1,10,10)))
    f(hcl_A,hcl_B)
void test(float B[1][1][10][10], float A[1][1][8][8]) {
    B_x: for (bit32 x = 0; x < 10; ++x) {
      B_y: for (bit32 y = 0; y < 10; ++y) {
        B[0][0][x][y] = ((hls::max(x, y) < 8) ? ((float)A[(((y / 8) + x) / 8)][0][(((y / 8) + x) % 8)][(y % 8)]) : (0.000000e+00f));
      }
    }
  }

I think accessing A[0][0][x][y] is okay. Is the simplification logic somewhere not working properly?

chhzh123 commented 4 years ago

I found the problem. The simplification logic only works when the loop range is smaller than the array size. https://github.com/cornell-zhang/heterocl/blob/d3173471e877c32fd9327e882575499c46f10f69/tvm/src/pass/simple_passes.cc#L174-L176 I think it's unnecessary? And the correctness of the program should be guaranteed by the users (e.g. avoid accessing invalid memory). Even this comparison is added, the program can still access memory out of the boundary. For example, change the compute function to hcl.compute((1,1,10,10), lambda i, c, x, y: A[i, c, x, y], "B", dtype), then (y / 8 + x)/8 can be 1 when x=y=9, which is larger than the size of the 1st dimension. @seanlatias

seanlatias commented 4 years ago

This is the legacy from Halide. I think maybe we can add a tag or something to turn on/off this feature.

chhzh123 commented 4 years ago

I found the problem. The simplification logic only works when the loop range is smaller than the array size. https://github.com/cornell-zhang/heterocl/blob/d3173471e877c32fd9327e882575499c46f10f69/tvm/src/pass/simple_passes.cc#L174-L176

I think it's unnecessary?

These several lines should not be directly removed. Otherwise, the correctness of the program may not be ensured. As an example,

uint32 A[4][4];
uint32 B[16];
for (uint32 i = 0; i < 16; ++i)
    B[i] = A[i / 4][i % 4];

will be simplified to

for (uint32 i = 0; i < 16; ++i)
    B[i] = A[0][i];

which may cause SegFault and is incorrect.

zhangzhiru commented 4 years ago

@seanlatias we should first support multi-dimensional array in our internal IR.