These changes replace struct accfft_plan/accfft_plan_gpu with a new AccFFT/AccFFT_gpu class and use templates to combine float and double precision codes. The result is a smaller code that is easier to maintain. All tests appear to pass.
Note that looking at the diffs between accfft_gpu and accfft_gpuf, there are several calls to cudaDeviceSynchronize that only appeared in accfft_gpu. These have been commented in the combined code and marker "DOUBLE ONLY".
These changes replace struct accfft_plan/accfft_plan_gpu with a new AccFFT/AccFFT_gpu class and use templates to combine float and double precision codes. The result is a smaller code that is easier to maintain. All tests appear to pass.
Note that looking at the diffs between accfft_gpu and accfft_gpuf, there are several calls to cudaDeviceSynchronize that only appeared in accfft_gpu. These have been commented in the combined code and marker "DOUBLE ONLY".