Closed jcasas00 closed 1 year ago
This issue is turning out to be what's causing the most issues now (and workarounds can be quite invasive). If all references to the .v are within one function, it is easy to fix. The problem is that if it is passed to some other function, then failures would depend on how that function uses it, etc. We're suddenly in a situation where changing some low-level function (e.g., to add a bounds check) will cause a fault because it created additional references to the expression.
The same issue also seems to apply to struct fields.
Thanks! We are discussing about a proper fix for this and many related issues. Hongzheng explained this in one of our conversations, let me paste it here for reference:
I notice most of the bugs of the control flow constructs are actually caused by Python’s language features. For example, #71, #135, #138, #141, #148 are all
hcl.if_
/hcl.while_
problems. The main complexity comes from the Python evaluation order. Considerhcl.if_(x == 0)
, the conditionx==0
part is evaluated (the IR is already built) before calling thehcl.if_
function. In the current workflow, we need to do the following things:
- Traverse back from
x==0
(CmpOp) and check whether this expression is affine. If the expression is non-affine, then it is fine. We can simply generate thescf.if
statement and plug the result of CmpOp into it.- If the expression is affine, we need to traverse back again to build an affine set expression and remove all the previous operations involving this condition, otherwise the condition will be computed twice. We require this separation between affine and non-affine expression to leverage the
affine.if
facility.When traversing back, we may remove some of the useful operations by mistake, since we do not know the scope of the variable before executing the following code. It is not like traditional parser that can read the code from left to right and generate the corresponding IR. Currently we are highly restricted by the host language, so it is a bit counterintuitive when building these control flows with conditions.
Hi @jcasas00, could you try latest commits and see if it works?
It seems to work with the .v usage now.
But it still fails if it is a struct field. For example, if the inst_id in the example above was defined as:
stype = hcl.Struct({"x": hcl.UInt(8), "y": hcl.UInt(8)})
inst_id = hcl.scalar(0x1234, "foo", dtype=stype).v.x
The error in this case is "error: operand #0 does not dominate this use". But the usage pattern that causes the issue is the same (i.e., multiple hcl.if_(i ...) statements).
Ah yes, the scope issue. When a LoadOp
is moved to another block I also rebuild one in the original place. I haven't done that for StructGetOp
. I'll add this for StructGetOp
, GetBitOp
, GetSliceOp
etc.
The same error (#0 does not dominate this use) in the following scenario (note the "+ 1"):
inst_id = hcl.scalar(0, "inst_id", dtype=hcl.UInt(16)).v + 1
Hi Jeremy, could you pull both repositories and see if it works now? I've added test cases on the CI: https://github.com/cornell-zhang/heterocl/blob/hcl-mlir/tests/mlir/test_scalar.py
The test cases cover
This version seems to work now! Thanks for the quick fixes/updates!!
I'm seeing another failure with the same cause signature (i.e., an expression used multiple times).
def kernel():
f = hcl.scalar(0, "f", dtype='uint32')
t = hcl.scalar(0, "t", dtype='uint32')
c = f.v == 0
with hcl.if_(c == 0):
hcl.print((),"c is false")
with hcl.while_(c == 0): # ... hang
t.v = 0 # some irrelevant expr
r = hcl.compute(rshape, lambda _:0, dtype=hcl.Int(32))
return r
In this case, because the hcl.while_ statement is using 'c', the following error is generated:
python: /home/jcasas/dprive/llvm-project/mlir/lib/IR/Block.cpp:231: mlir::Operation* mlir::Block::getTerminator(): Assertion '!empty() && back().mightHaveTrait<OpTrait::IsTerminator>()' failed.
The intent of this code was an assert w/a for now (i.e., print the failure and hang). I could use a different while test (e.g., while((f.v != 0) and get it to work but the system should still not fault with the code above.
This seems like a different issue with the same cause. hcl.while_
used ASTVisitor in "remove" mode to remove and rebuild the condition operations in scf.while
's condition block. Consequently, it removed the loadOp
built from f.v
, causing an assertion fault. I have fixed this issue by replacing the AST remover with an AST mover that doesn't remove any op but instead moves them to the right block.
Commit: 592b29864d4b36ec85bfa9d82c0d22209d19d6bd
Removing urgent tag as the primary issue has been resolved.
The scope issue has been resolved by revamping the frontend and adding HCL AST: https://github.com/cornell-zhang/heterocl/pull/473
generates the following error:
A workaround is to NOT lookup the .v value on assignment and instead use the .v where inst_id is used. Intuitively, this change shouldn't matter though. Is there an assumption that each .v result is only used once? If so, the coding style used above where the the .v is dereferenced once but used multiple times will not work.