Currently, we view each operation as a new object and rebuild it no matter it was built before, which incurs redundant memory accesses and computation. In the following example, without duplicated operation detection, A[x] may be loaded several times.
Currently, we view each operation as a new object and rebuild it no matter it was built before, which incurs redundant memory accesses and computation. In the following example, without duplicated operation detection,
A[x]
may be loaded several times.From the generated MLIR code, we can see three loads are needed.
The original HeteroCL implementation also requires three loads. Obviously, TVM's one-line code cannot reuse the operands without expression folding.
After adding duplication detection, we can reuse the previous results and generate a much clean code.