add support for GPUs using OpenACC

@orvedahl documented some relevant tests for running Rayleigh on GPUs using OpenACC in this repository.

Before starting the actual port we should enable support for the relevant compilers in the Rayleigh build infrastructure. For Nvidia GPUs that would be the PGI compiler nvfortran. As it lacks support for quad-precision floating-point numbers, which we use in Math_Layer/Legendre_Polynomials.F90, we need to find a way around this.

There are several options to fix this:

Rewrite Legendre_Polynomials.F90 to avoid the use of quad precision numbers. It needs to be checked if the precision is needed after all.
Implement quad-precision through an external library, such as GMP.
Build the code using gfortran with its OpenACC support for Nvidia-PTX offloading. I have tested this and it works, but the downside here is that we will essentially always have to build our own compiler on each Nvidia GPU enabled cluster.
Build only Legendre_Polynomials.F90 with a different compiler (e.g., gfortran). This is hard to do, because the .mod file format of nvfortran and gfortran is not compatible.

I am leaning towards option 1 and if that doesn't work using option 2.

After the build system works, we should implement @orvedahl's changes to the loops and also explore if we can make use of direct GPU-to-GPU MPI communication. That would hopefully allow us to keep the data on the GPU for the whole computation, except for I/O.

geodynamics / Rayleigh

add support for GPUs using OpenACC #475