cornell-zhang / hcl-dialect

HeteroCL-MLIR dialect for accelerator design
https://cornell-zhang.github.io/heterocl/index.html
Other
38 stars 17 forks source link

Unsigned struct field extracted as signed #158

Closed jcasas00 closed 1 year ago

jcasas00 commented 1 year ago
def test_struct_unsigned_field():
    hcl.init()
    rshape = (1,)
    stype = hcl.Struct({"x": hcl.UInt(4), "y": hcl.UInt(4)})
    def kernel():
        r = hcl.compute(rshape, lambda _:0, dtype=hcl.Int(32))
        a = hcl.scalar(12, "a", dtype=stype).v
        with hcl.if_(a.x == 12):
            r[0] = 1
        return r
    #
    s = hcl.create_schedule([], kernel)
    print(hcl.lower(s))
    hcl_res = hcl.asarray(np.zeros(rshape, dtype=np.uint32), dtype=hcl.UInt(32))
    f = hcl.build(s)
    f(hcl_res)
    np_res = hcl_res.asnumpy()
    golden = np.asarray([1], dtype=np.uint32)
    assert np.array_equal(golden, np_res)

The test above fails since the value of a.x is interpreted and extended as a signed integer instead of unsigned as defined in the struct dtype.

    %4 = hcl.struct_get %3[0] {moved} : <i4, i4> -> i4
    %c12_i32 = arith.constant {moved} 12 : i32
    %5 = arith.extsi %4 {moved} : i4 to i32
    %6 = arith.cmpi eq, %5, %c12_i32 {moved} : i32
jcasas00 commented 1 year ago

Added urgent label on this one. The issue with this is that the usage of struct fields is pervasive and while we can work-around it by explicitly casting the value, this is not practical on a big program. The other case that this is causing an issue, and I suspect why I'm seeing segfaults, is when we have memory indexing (e.g., mem[a.x] = 5). Since the a.x value will be interpreted as a negative number, who knows what memory area it is overwriting.

double free or corruption (!prev) Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH to point to it): /home/jcasas/dprive/heterocl_mlir/hcl-dialect-prototype/build/tools/hcl/python_packages/hcl_core/hcl_mlir/_mlir_libs/libHCLMLIRAggregateCAPI.so.15(+0x11bf904)[0x14c7c37fb904] /home/jcasas/dprive/heterocl_mlir/hcl-dialect-prototype/build/tools/hcl/python_packages/hcl_core/hcl_mlir/_mlir_libs/libHCLMLIRAggregateCAPI.so.15(+0x11bd064)[0x14c7c37f9064] /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0)[0x14c7cfafd3c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x14c7cf7de03b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x14c7cf7bd859]

zzzDavid commented 1 year ago

Added struct get op type hint to fix this issue in ecf4cfb03c98a354173ccb01801a56e936619860

jcasas00 commented 1 year ago

Thanks Niansong. This passes the test above now but not the memory access use case I mentioned. Here's a test case:

def test_struct_unsigned_field_memaccess():
    hcl.init()
    rshape = (1,)
    bits = 20
    addr = 1 << (bits - 1)
    stype = hcl.Struct({"x": hcl.UInt(bits), "y": hcl.UInt(4)})
    def kernel():
        r = hcl.compute(rshape, lambda _:0, dtype=hcl.UInt(32))
        mem = hcl.compute((1<<bits,), lambda _:0, dtype=hcl.UInt(32))
        a = hcl.scalar(addr, "a", dtype=stype).v

        mem[a.x] = 0xdeadbeef       # mem[a.x] access causes a segfault
        r[0] = mem[addr]
        return r
    #
    s = hcl.create_schedule([], kernel)
    print(hcl.lower(s))
    hcl_res = hcl.asarray(np.zeros(rshape, dtype=np.uint32), dtype=hcl.UInt(32))
    f = hcl.build(s)
    f(hcl_res)
    np_res = hcl_res.asnumpy()
    golden = np.asarray([0xdeadbeef], dtype=np.uint32)
    assert np.array_equal(golden, np_res), f"np_res = {np_res}"

I see the IR is annotated as unsigned but I think it is still treated as signed because of the arith.index_cast.
It probably should use arith.index_castui given the unsigned attribute.

    %5 = hcl.struct_get %4[0] {unsigned} : <i20, i4> -> i20
    %c-559038737_i32 = arith.constant -559038737 : i32
    %6 = arith.index_cast %5 : i20 to index
    memref.store %c-559038737_i32, %1[%6] {to = "compute_1", unsigned} : memref<1048576xi32>
zzzDavid commented 1 year ago

Thanks for the suggestion! I agree using arith.index_castui should solve the issue, but this operation is added after LLVM15, specifically this commit 23 days ago: llvm/llvm-project@f8fafe99a4ee2c047acf5a79d1033da8024f1f26. We will fix this issue next time we bump the LLVM version.

zzzDavid commented 1 year ago

Just checked the latest release llvmorg-15.0.3 doesn't include this change, we probably have to wait for a later release

jcasas00 commented 1 year ago

Before the LLVM version with arith.index_castui is released, an interim workaround is to cast the index value to unsigned bits+1 size. For example, instead of: mem[a.x] = ... Effectively do: mem[ hcl.cast(hcl.UInt(hcl.get_bitwidth(a.x.dtype)+1),a.x) ] = ...

With the https://github.com/cornell-zhang/hcl-dialect-prototype/commit/ecf4cfb03c98a354173ccb01801a56e936619860 fix, this now generates the following IR which works for the above test case: %5 = hcl.struct_get %4[0] {unsigned} : <i20, i4> -> i20 %6 = arith.extui %5 {unsigned} : i20 to i21 %c-559038737_i32 = arith.constant -559038737 : i32 %7 = arith.index_cast %6 : i21 to index memref.store %c-559038737_i32, %1[%7] {to = "compute_1", unsigned} : memref<1048576xi32>

zzzDavid commented 1 year ago

This issue is fixed after https://github.com/cornell-zhang/heterocl/pull/473 is merged. Test case added by cornell-zhang/heterocl@45e8b03a2148b02fec8916d5ff3058aa3deb5a52

The revamped frontend introduces an extensible type system, so all expressions are typed before building the IR. We can also do assert hcl.scalar(12, "a", dtype=stype).v.x.dtype == hcl.UInt(4)