cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
326 stars 92 forks source link

Bit-masking converted to modulo operation with incorrect divisor. #398

Closed jcasas00 closed 3 years ago

jcasas00 commented 3 years ago

Code:

def mask(A):
    return hcl.compute(A.shape, lambda x: (A[x] & 0xFFFF), "mask", dtype=A.dtype)

A = hcl.placeholder((2,), "A", dtype=hcl.UInt(16))
s = hcl.create_schedule([A], mask)

print(hcl.lower(s))
m = hcl.build (s)

hcl_A = hcl.asarray([10,5], dtype=A.dtype)
hcl_R = hcl.asarray([99,99], dtype=hcl.UInt(16))
m (hcl_A, hcl_R)
print(f"hcl_R={hcl_R}")

Output:

// attr [_top] storage_scope = "global"
allocate _top[int32 * 1]
produce _top {
  // attr [0] extern_scope = 0
  produce mask {
    // attr [0] extern_scope = 0
    for "stage_name"="mask" (x, 0, 2) {
      mask[x] = (A[x] % (uint16)65536)
    }
  }
}

hcl_R=[99 99]           **<---- not updated**

The "& 0xFFFF" code gets translated to "% (uint16)65536" due to the dtype of A. But 65536 == 0x10000, and when casted to uint16, becomes 0. Not sure if the backend really generates "% 0" code but this is undefined (and possibly why the output is not updated -- although would have expected a runtime failure). Why not just leave it as bitwise_and?

seanlatias commented 3 years ago

This is indeed a bug.

seanlatias commented 3 years ago

@jcasas00 could you try #399?

jcasas00 commented 3 years ago

Thanks. Will give this a try.

jcasas00 commented 3 years ago

Forgot to update. Yes, the fix seems to be okay now.