This optimizes the dot product between two matrices. The two operations give the same result, but the original implementation allocated a new matrix while the new implementation does not require any intermediate memory (the inverse in both versions is unavoidable though)
This optimizes the dot product between two matrices. The two operations give the same result, but the original implementation allocated a new matrix while the new implementation does not require any intermediate memory (the inverse in both versions is unavoidable though)