Prior to this commit, the scheduling assumed the gemm block would be the second to last block in the function ("unpadding" step is the final block). However, when dense is fused with a bias or activation the gemm block is no longer the second to last block. This commit instead searches a single reduction block to use as the gemm block.
Prior to this commit, the scheduling assumed the gemm block would be the second to last block in the function ("unpadding" step is the final block). However, when dense is fused with a bias or activation the gemm block is no longer the second to last block. This commit instead searches a single reduction block to use as the gemm block.
cc @ekalda @Anndrey24