Closed adeeconometrics closed 4 months ago
dev-lazymatrix
implements a simpler std::vector
internal, some of the benefits found in this branch are:
clang loop vectorize
up to 690 GFLOPs for M2 Matrix
implementation, some contentions in reverting back to std::array
for smaller matrices for $\mathbb{R}^{M \times N} < \mathbb{R}^{256 \times 256}$ are: (a) stack size varies per hardware per type so it is difficult to design for reliable performance and universal types, (b) C++ standard does not impose stack size requirement for std::thread
library so it's difficult to conditionally adapt $M,N$ size per hardware (metadata is not available). Note: Function that was evaluated $$A \times B + A \cdot B \cdot (\sin(A) \times \cos(A) + B)$$
Avenues to explore:
Should see if the refactored
Matric::operator=
improve performance for larger matrices