cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
326 stars 92 forks source link

Can't initialize scalar with constants > 32-bits #400

Open jcasas00 opened 3 years ago

jcasas00 commented 3 years ago

Code:

    def testme(A):
        def doit(x):
            v = hcl.scalar(0xFF_0000_0000, "v", dtype=hcl.UInt(64))
            v[0] = 0x00_FFFF_FFFF
            v[0] = 0xFF_0000_000F
            return v.v
        return hcl.compute(A.shape, lambda x: doit(x), "doit", dtype=hcl.UInt(64))

    A = hcl.placeholder((2,), "A", dtype=hcl.UInt(16))
    s = hcl.create_schedule([A], testme)

    print(hcl.lower(s))

Output: // attr [_top] storage_scope = "global" allocate _top[uint64 1] produce _top { // attr [0] extern_scope = 0 produce doit { // attr [0] extern_scope = 0 for "stage_name"="doit" (x, 0, 2) { // attr [v] storage_scope = "global" allocate v[uint64 1] produce v { // attr [0] extern_scope = 0 for "stage_name"="v" (x, 0, 1) { v[x] = (uint64)0 <--- why 0? Expecting 0xFF_0000_0000 (40-bits) } } v[0] = (uint64)18446744073709551615 <--- all 64-bits=1. The 0xFFFF_FFFF in the HCL code seems to be interpreted as -1 (32-bit), then sign-extended to 64-bits v[0] = (uint64)15 <--- looks like only taking lower 32-bits (consistent with first case). doit[x] = v[0] } } }

seanlatias commented 3 years ago

This is a hard limitation in HCL right now. As a workaround, you can use the set bit/set slice APIs. Examples can be found here. For the usage, please check this PR: #291.

jcasas00 commented 3 years ago

Tried to break-up the constant (into 32-bits) and use the set bit/set slice APIs as follows:

    def testme(A):
        def doit(x):
            x = 0xFA_FF00_FFFF
            v = hcl.scalar(0, "v", dtype=hcl.UInt(64))
            v[0][31:0]  = (x >>  0) & 0xFFFF_FFFF             # break-up into 32-bit chunks ...
            v[0][63:32] = (x >> 32) & 0xFFFF_FFFF
            return v.v
        return hcl.compute(A.shape, lambda x: doit(x), "doit", dtype=hcl.UInt(64))

    A = hcl.placeholder((2,), "A", dtype=hcl.UInt(16))
    s = hcl.create_schedule([A], testme)

    print(hcl.lower(s))
    m = hcl.build (s)

    hcl_A = hcl.asarray([0xA0A0,0xA0], dtype=A.dtype)
    hcl_R = hcl.asarray([99,99], dtype=hcl.UInt(64))
    m (hcl_A, hcl_R)
    print(f"hcl_R = {[hex(i) for i in hcl_R.asnumpy()]}")

The schedule looks okay:

      produce v {
        // attr [0] extern_scope = 0
        for "stage_name"="v" (x, 0, 1) {
          v[x] = (uint64)0
        }
      }
      v[0] = v[0][31:0].set(-16711681)                                 <-- looks like this still sign-extends the assignment (to all 64-bits despite the slice spec).
      v[0] = v[0][63:32].set(250)                                           <-- expected this to clear bit-63 but doesn't
      doit[x] = v[0]

But the result has the sign-bit set for some reason:

hcl_R = ['0x800000faff00ffff', '0x800000faff00ffff'] <-- bit 63 is set??

seanlatias commented 3 years ago

That's because in our current implementation, we take in numbers as int32. So it got sign-extended. For the API, I know it might be a bit counter-intuitive for Python users, you need to write

            v[0][32:0]  = (x >>  0) & 0xFFFF_FFFF             
            v[0][64:32] = (x >> 32) & 0xFFFF_FFFF
seanlatias commented 3 years ago

Note the upper bound differences.

jcasas00 commented 3 years ago

Hmm -- yes, the use of 1 more upper bit is odd. Is this documented somewhere? Will give this a try.

A perhaps related question -- is this the same issue with this code :

hcl.asarray([[1, 1085102592571150095], [13, 14106333703424951235]], dtype=hcl.UInt(64)) <tvm.NDArray shape=(2, 2), cpu(0)> array([[4607182418800017408, 4876868561191968286], [4623507967449235456, 4893293453950613624]], dtype=uint64)

zhangzhiru commented 3 years ago

v = hcl.scalar(0xFF_0000_0000, "v", dtype=hcl.UInt(64))

@seanlatias as a workaround, can we initialize a 64b scalar using a string constant?

v[0] = (uint64)18446744073709551615 <--- all 64-bits=1. The 0xFFFF_FFFF in the HCL code seems to be interpreted as -1 (32-bit), then sign-extended to 64-bits

This is indeed very counterintuitive. Let's find a way to fix this issue.

seanlatias commented 3 years ago

v = hcl.scalar(0xFF_0000_0000, "v", dtype=hcl.UInt(64))

@seanlatias as a workaround, can we initialize a 64b scalar using a string constant?

Sure. Maybe this is something @zzy82518996 can work on?

zzy82518996 commented 3 years ago

I will hack into that to see if I can solve this issue.