Computation of d3 and d4 very slow for systems with many symmetries

The MPI_Lanczos branch: The application of d3 and d4 vectors within the dynamical Lanczos is very slow if the system has a lot of symmetries. This is due to the fact that the symmetries are not factorized but applied all together (to avoid memory allocation).

This overhead can be solved in many ways: EASY FIXES: 1) Now the application of the symmetries is performed in the block of degenerate modes. However, each symmetry has its own block of modes on which it can create degeneracies, so instead of using the full degenerate space, use the space given by each symmetry operation. 2) Use the complex polarization vector that diagonalizes the translations. In this way, degenerate modes of different q points will not speak each other, reducing the degenerate space of importance only to symmetries on the same small group of the q point.

COMPLETE FIX: Since the symmetrization can in principle be applied in subsequent loops and not in nested loops, modify the nested loop in sequential loops. However, this needs to store the full result after each loop, which, in case of d4, may request too much memory (N_degeneracy^4). If this fix is combined however with the two easy fixes, the degenerate space is much reduced and be independent on the supercell size. If this is the case, then it can be stored. This will completely kill to zero the time of the symmetrization.

SSCHAcode / python-sscha

Computation of d3 and d4 very slow for systems with many symmetries #21