Mat mul whole array implementation using tiler helper tools

The changes to mat mul whole array in https://github.com/Xilinx/mlir-aie/pull/1870 did not change implementation of the design at all, they just added some hooks to help with visualization.

The implementation changes in this PR actually use the TensorTiler2D to generate offsets/sizes/strides in a new copy of the mat mul whole array design.

This is useful for comparing how the code looks in each approach, but also the B tiles produced by the tiler are functionally equivalent to the B tiles produced by the original design, but the sizes/strides are different. I plan to use this branch to benchmark if the difference results in any performance changes between the two implementations.

Update: benchmark with sweep does not show performance difference, so I am going to try to merge.

Xilinx / mlir-aie

Mat mul whole array implementation using tiler helper tools #1924