MPI-parallelization of sktwocnt

Straight forward (non-distributed) and optional MPI parallelization of the sktwocnt binary, employing the following strategy:

fixed maximum distance (static tabulation) Creates a single batch of dimer distances that are computed by available MPI ranks. Nothing fancy, MPI ranks exceeding the total number of distances are idling. We might want to print a warning or even block such calculations.
dynamic batches with maximum distance (2 cases: converged or max. distance reached) If the number of MPI ranks undercuts the number of dimer distances contained in the default 1 Bohr batch length, the distances are computed by the different ranks. If the number of ranks exceeds the number of distances within the default 1 Bohr batch length, the batch size is automatically increased to accomodate as many dimer distances as there are MPI ranks. By doing so we mostly avoid ranks idling around, but an increased batch size later requires the Hamiltonian and overlap data to be cropped to the same size one would have obtained with the default batch length of 1 Bohr. Otherwise the length of the converged SK-tables would depend on the number of ranks.

To be merged after #90 (needs to be rebased).

dftbplus / skprogs