pre-processing GPU support

Is your feature request related to a problem? Please describe. Now that pre-processing makes use of internal numpy functions and is multi-threaded we are now in a position to make use of gpu support through cupy.

Describe the solution you'd like Enable GPU-based pre-processing

Additional context A simple change to _resample to:

def _resample(data, delta, factor, sampling_rate, large_w, gpu):
    """
    Resample data in the frequency domain - adapted from obspy resample method

    :param data: Data to resample
    :type data: np.ndarray
    :param delta: Sample interval in seconds
    :type delta: float
    :param factor: Factor to resample by
    :type factor: float
    :param sampling_rate: Desired sampling-rate
    :type sampling_rate: float
    :param large_w: Window to apply to spectra to stabalise resampling
    :type large_w: np.ndarray
    :param gpu: Whether to work on the gpu or not - uses cupy under the hood
    :type gpu: bool

    :return: np.ndarray of resampled data.
    """
    if factor == 1:
        # No resampling needed, don't waste time.
        return data
    if gpu:
        import cupy as np
        # Move data to gpu
        data = np.asarray(data)
        large_w = np.asarray(large_w)
    else:
        import numpy as np
    # Need to work with numpy objects to release the GIL
    npts = data.shape[0]
    df = np.float32(1.0) / (npts * delta)
    num = np.int32(npts / factor)
    d_large_f = np.float32(1.0) / num * sampling_rate

    # Forward fft
    x = np.fft.rfft(data)
    # Window
    x *= large_w[:npts // 2 + 1]

    # interpolate
    f = df * np.arange(0, npts // 2 + 1, dtype=np.int32)
    n_large_f = num // 2 + 1
    large_f = d_large_f * np.arange(0, n_large_f, dtype=np.int32)

    # Have to split into real and imaginary parts for interpolation.
    y = np.interp(large_f, f, np.real(x)) + (1j * np.interp(
        large_f, f, np.imag(x)))
    # Try to reduce memory before doing the ifft
    del large_f, f, x

    out = np.fft.irfft(y, n=num)[0:num] * (np.float32(num) / np.float32(npts))
    if gpu:
        return out.asnumpy()
    return out

results in a 10x speedup for me for data 8640000 samples long (1 day at 100 Hz) resampled by 2.5. I haven't tested multi-threading support for this, but it should be GIL releasing. We might run into VRAM issues though, but we might run into RAM issues with CPU-based anyway as well as we have always done.

Changes like these should be simple to make to detrend and to filtering, but resample is the slowest part of pre-processing (that could be sped up by this).

eqcorrscan / EQcorrscan

pre-processing GPU support #542