Dplasma warming run - Githubissues

PR based on https://bitbucket.org/icldistcomp/dplasma/pull-requests/88

For all DPLASMA testing drivers that support timing, add support for the --nruns option (defaults to 3 timed run per execution) with warmup loop iteration.

-x forces --nruns to be 0, meaning only the warmup run (without timing information displayed) is executed to prepare the matrices to check.

Each dpalsma tester does nruns+1 iterations of the main operation (sometimes hard to define for operations that involve multiple DAGs, in this case, each is done nruns+1 times), and only the last nruns timing are displayed, to remove artefacts like the cost of initializing the mathematical library.

This patch also introduces some fixes in a few benchmarks (zheev, the CUDA-enabled DTD that did not manage the case where CUDA is compiled-in but there is no CUDA device available, and a few other issues).

ICLDisco / dplasma

Dplasma warming run #47