Open naoyam opened 1 month ago
Should we also handle something like this:
t0: [i0]
t1 = t0 // [i1]
t2 = t1 // [i2]
t3 = t2 // [i3/2, 2]
t4 = t3 // [i4]
t5 = t1 + t4 // [i5]
we can not inline t1
at 1, although i2
is mapped with i5
.
I think that's true in our current inlining system. In the original design of computeAt
, however, the split of t3
would be propagated across the fusion to make them inlinable.
In general, I think that the analysis of inlinability needs to be a global analysis. ComputeAtLogicalDomainMap
does that for the reduction-broadcast pattern, but that's not the only case that affects inlinability.
In RoPE-like fusions, where a domain is sliced and then padded back to the original domain, inlining seems to need to consider a constraint that is similar to the persistent constraint in normalization fusions.
Simplified example:
Here,
i2
cannot be inlined intoi3
since the extent ofi2
is larger than that ofi3
. That also meansi1
cannot be inlined intoi2
since if that's done,i5
would be also pulled into the same inlined loop, which would then meani3
andi4
would need to be pulled together. Sincei2
andi3
cannot be inlined together, that inling pattern is invalid.This is quite similar to the inlining constraint due to the reduction-broadcast pattern in normalization. I think ideally we should generalize the analysis to consider constraints like this case.