Handling large matrices

Previously, full matrices were being allocated on each node, thus limiting the max problem size that can be run. However, our testing still assumes that rank 0 can collect the full matrix and check the correctness. Therefore, we had to avoid allocating full matrices on each node, but still keep the testing validation.

This PR fixes this by doing the following:

The main input matrix is properly distributed and initialized using COSTA.
The matrix C is only allocated if the validation is on.
The matrix B (used for validation within B_Win) is removed and the window is using the matrix C instead.
To keep the validation, we pass the hardcoded test cases through a lambda function to COSTA for initialization.

eth-cscs / conflux

Handling large matrices #16