Closed Cookiee235 closed 2 weeks ago
Looks like the test case can be made even simpler, and isn't limited to RemoveUnusedOutputs
. The root cause is that StructuralEqual
compared TIR floats by checking abs(lhs-rhs) < 1e9
, which always evaluates to false for NaN
values.
@T.prim_func(private=True)
def func_1():
return T.float32("nan")
@T.prim_func(private=True)
def func_2():
return T.float32("nan")
tvm.ir.assert_structural_equal(func_1, func_2)
I've implemented #17249 which should fix this issue, by having StructuralEqual
and StructuralHash
have special handling to compare NaN
values.
@Lunderberg Fixing for fixing the wrong implementation about assert_structural_equal
.
I have another question. For the given IRs in my script, an odd IRs was obtained after using the RemoveUnusedOutputs
optimization. It seems function func
should not be removed. Can you help me check if this is a bug?
@I.ir_module
class Module:
@R.function
def main(v0_0: R.Tensor((1,), dtype="int32"), v1_0: R.Tensor((42,), dtype="int32")) -> R.Tuple(R.Prim(value=T.float64("nan")), R.Prim(value=T.float64("nan")), R.Prim(value=T.float64("nan"))):
R.func_attr({"num_input": 2})
with R.dataflow():
res: R.Tuple(R.Prim(value=T.float64("nan")), R.Prim(value=T.float64("nan")), R.Prim(value=T.float64("nan"))) = R.prim_value(T.float64("nan")), R.prim_value(T.float64("nan")), R.prim_value(T.float64("nan"))
R.output(res)
return res```
Ooh, I had missed that part, and thought there was a nan
inside the original model. Thank you for calling my attention to it.
The introduction of nan
is a bug in RemoveUnusedOutputs
. When determining which outputs of a callee are used, it only collected usages in TupleGetItem(out_tuple, index)
. If the tuple is used in a context that doesn't access a specific element, such as returning from a function, then the usage is skipped.
The nan
values are intended as placeholders, as a dummy value for indexing. If a callee produces (A,B,C)
, but B
is never used, then the callee would be updated to produce (A,C)
, and callsites would be updated to replace res = callee()
with new_output = callee(); res = (new_output[0], NaN, new_output[1])
. The intermediate tuple would then be deconstructed with CanonicalizeBindings
. Since nothing ever accessed res[1]
, the NaN
value at that location would drop out altogether.
So, if a function called a subroutine that produces a tuple, then immediately returned that tuple, the usage would fail to be collected, and the tuple elements would be erroneously replaced with NaN
. This should be resolved with https://github.com/apache/tvm/pull/17253.
Re-opening, as the auto-close from #17249 wasn't correct. This issue still requires #17253 to land in order to be resolved.
Hi all, The pass
RemoveUnusedOutputs
seems to give an unexpected optimized result. Due to the lack of detailed documentation about this API (e.g.,relax.transform.RemoveUnusedOutputs
), I cannot confirm if the optimization result is wrong.In addition, another bug is about the API
tvm.ir.assert_structural_equal
, for the totally same mod, this API judge the structure of them as unequal. It was triggered by IRs with the string "nan".Actual behavior
Steps to reproduce
cc @Lunderberg @junrushao