This PR brings the fully-distributed testing. The distributed result matrix C is split into the lower- triangular partL and the upper-triangular part U. From the permutation vector, the distributed permutation matrix P is constructed. The initial matrix A is multiplied with P (equivalent to permuting rows of matrix A) and the Frobenius norm ||L*U - P*A|| is computed using COSTA and provided scalapack. The result is correct if this norm is small enough.
To enable the testing, it is necessary to build conflux as follows:
Observe that this requires scalapack, as pdgemm is used for matrix multiplication.
When running the miniapp, it is possible to specify the print limit flag, e.g. with -l 20 in which case, matrices with dimension less than 20 will be fully gathered to rank 0 and printed. This is useful for debugging purposes.
In this case, the print limit was large enough to allow gathering full matrices to matrix 0 and printing them. Decreasing the print limit would only produce:
This PR brings the fully-distributed testing. The distributed result matrix
C
is split into the lower- triangular partL
and the upper-triangular partU
. From the permutation vector, the distributed permutation matrixP
is constructed. The initial matrixA
is multiplied withP
(equivalent to permuting rows of matrixA
) and the Frobenius norm||L*U - P*A||
is computed using COSTA and provided scalapack. The result is correct if this norm is small enough.To enable the testing, it is necessary to build conflux as follows:
Observe that this requires scalapack, as
pdgemm
is used for matrix multiplication.When running the miniapp, it is possible to specify the print limit flag, e.g. with
-l 20
in which case, matrices with dimension less than 20 will be fully gathered to rank 0 and printed. This is useful for debugging purposes.For example, the full-output looks as follows:
In this case, the print limit was large enough to allow gathering full matrices to matrix 0 and printing them. Decreasing the print limit would only produce:
In this case, everything is computed in a distributed fashion and no rank holds the full matrix.
In both cases, the total frobenius norm of
L*U - P*A
is0
, indicating that the result is correct.