This PR is the last step of the Distributed Tridiagonal Solver optimization in order to reduce GEMM cost.
After all the preliminary work to make the tridiagonal work with a well-shaped matrix of eigenvectors, now it is possible to exploit that to actually reduce the GEMM cost.
Main changes:
compute "geometry" of the reduced GEMM
reduce the GEMM step + just copy deflated part
remove fill1 from rank1 (it was for making the gemm copying deflated eigenvalues, but now)
Close #916
This PR is the last step of the Distributed Tridiagonal Solver optimization in order to reduce GEMM cost.
After all the preliminary work to make the tridiagonal work with a well-shaped matrix of eigenvectors, now it is possible to exploit that to actually reduce the GEMM cost.
Main changes: