Look into increasing speed of computations by aligning our vectors in memory and possibly mapping directly? There are some tricks to make this faster we can pursue to optimize memory performance. algorithmically the implementation is ideal by blocking. possibly able to divide large matrices into smaller sub problems and accumulate result similar to the existing implementation
Look into increasing speed of computations by aligning our vectors in memory and possibly mapping directly? There are some tricks to make this faster we can pursue to optimize memory performance. algorithmically the implementation is ideal by blocking. possibly able to divide large matrices into smaller sub problems and accumulate result similar to the existing implementation