Provide utility to downsample in time/frequency before output

At least in the case where we just want to downsample a 2D array along 1 axis, something like this should work as a kernel, I think.

#include <stdio.h>

__global__ void downsample2D_mean(float *input, float *output, int inputWidth, int outputWidth, int factor) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;

    if (i < outputWidth && j < inputWidth) {
        int start_idx = j * factor;
        double  sum = 0.0d;
        for (int k = 0; k < factor; k++) {
            sum += (double)input[start_idx + k];
        }
        output[i * inputWidth + j] = (float)(sum / (double)factor);
    }
}

and to launch it would look something like

// inputHeight = nchan
// inputWidth = nsamps
// outputWidth = inputWidth/dsfact

dim3 threadsPerBlock(16, 16);
dim3 numBlocks(
    (outputWidth + threadsPerBlock.x - 1) / threadsPerBlock.x,
    (inputHeight + threadsPerBlock.y - 1) / threadsPerBlock.y
);

// Launch the kernel
downsample2D_mean<<<numBlocks, threadsPerBlock>>>(d_input, d_output, inputWidth, outputWidth, dsfact);

Sensible factors (given our "inputWidth" will likely be 10000 samples = 1 second) are: 1, 2, 4, 5, 8, 10, 16, 20, 25, 40, 50, 80, 100, 125, 200, 250, 400, 500, 625, 1000, 1250, 2000, 2500, 5000, and 10000

CIRA-Pulsars-and-Transients-Group / vcsbeam

Provide utility to downsample in time/frequency before output #23