Efficient matrix-matrix/matrix-vector multiplication for generic value types

Motivation: Matrix multiplications are at the heart of many machine learning algorithms, and are often very expensive operations. Therefore, highly optimized kernels exist (e.g., in BLAS). These are typically tailored to common value types such as float or double. However, DAPHNE aims to be extensible w.r.t. the value type.

Task: Implement (C++) efficient kernels for general matrix-matrix and matrix-vector multiplication of dense matrices. Like most DAPHNE kernels, these should get the value type as a template parameter, such that various types like float and double, but also int64_t or uint8_t could be used with the same algorithm implementation. The kernels should be able to handle views into an existing matrix.

Hints:

Aim for an efficient implementation by employing techniques such as cache blocking and special code paths for, e.g., matrix-matrix, matrix-vector, and vector-vector multiplication.
Explicit multi-threading is not required here, since this will be handled by DAPHNE’s vectorized engine. However, if you are interested, you can also implement an alternative explicitly multi-threaded variant.
Add the new kernels in src/runtime/local/kernels/MatMul.h. There are already BLAS-based specializations for DenseMatrix<float> and DenseMatrix<double>. This task is to add a partial specialization for DenseMatrix<VT> (for any value type).

Possible task extensions for larger teams:

Also address special cases of matrix multiplication, such as a symmetric rank-k operation (syrk, src/runtime/local/kernels/Syrk.h). Furthermore make your implementation for matrix-vector multiplication available in the gemv-kernel (src/runtime/local/kernels/Gemv.h).
Also address sparse matrices by implementing efficient matrix multiplication kernels for CSRMatrix<VT>.

daphne-eu / daphne

Efficient matrix-matrix/matrix-vector multiplication for generic value types #521