In NNLSSolver::solve_parallel_with_scalapack, n_dist_loc_max is now determined from n_tot, not m (as it is supposed to). n_dist_loc_max only determines the memory size of the working array, not impacting the NNLS solution. Before this fix, the following working array variables were allocated with unnecessarily large sizes:
// The following matrices are stored in column-major format as Vectors
Vector mat_0_data(m * n_dist_loc_max, false);
Vector mat_qr_data(m * n_dist_loc_max, false);
hard-coded limit for number of processors
Per issue #269 , the number of processors that can be used for NNLSSolver is limited to 15. This limit is removed and tested through the unit test below.
termination criterion for NNLSSolver
Previously, NNLS iteration is terminated when $L_{\infty}$-norm (the maximum value) of the residual vector is below the threshold. Now NNLSSolver has an option to terminate when $L_2$-norm of the residual vector is below a corresponding threshold. This is set at initialization with the input argument NNLS_termination criterion.
unit test for NNLSSolver
A unit test for NNLSSolver::solve_parallel_with_scalapack is added.
This checks the solution approximates the system with desired tolerance, both in serial and parallel.
On LC quartz, the scaling test result is: (for 300x1000)
Number of processors
Time
1
908 ms
2
769 ms
4
563 ms
8
469 ms
12
463 ms
15
456 ms
Another test with a larger system (500x15000):
Number of processors
Time
1
24216 ms
2
13287 ms
4
7843 ms
8
5091 ms
12
4162 ms
15
3666 ms
This timing includes the entire setup for matrix-vector system aside from solve_parallel_with_scalapack.
minor fix/update for NNLSSolver
n_dist_loc_max
fixIn
NNLSSolver::solve_parallel_with_scalapack
,n_dist_loc_max
is now determined fromn_tot
, notm
(as it is supposed to).n_dist_loc_max
only determines the memory size of the working array, not impacting the NNLS solution. Before this fix, the following working array variables were allocated with unnecessarily large sizes:hard-coded limit for number of processors
Per issue #269 , the number of processors that can be used for NNLSSolver is limited to
15
. This limit is removed and tested through the unit test below.termination criterion for NNLSSolver
Previously, NNLS iteration is terminated when $L_{\infty}$-norm (the maximum value) of the residual vector is below the threshold. Now NNLSSolver has an option to terminate when $L_2$-norm of the residual vector is below a corresponding threshold. This is set at initialization with the input argument
NNLS_termination criterion
.unit test for NNLSSolver
A unit test for
NNLSSolver::solve_parallel_with_scalapack
is added. This checks the solution approximates the system with desired tolerance, both in serial and parallel.300x1000
)500x15000
):This timing includes the entire setup for matrix-vector system aside from
solve_parallel_with_scalapack
.