Open bwmeyers opened 1 year ago
At least in the case where we just want to downsample a 2D array along 1 axis, something like this should work as a kernel, I think.
#include <stdio.h>
__global__ void downsample2D_mean(float *input, float *output, int inputWidth, int outputWidth, int factor) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i < outputWidth && j < inputWidth) {
int start_idx = j * factor;
double sum = 0.0d;
for (int k = 0; k < factor; k++) {
sum += (double)input[start_idx + k];
}
output[i * inputWidth + j] = (float)(sum / (double)factor);
}
}
and to launch it would look something like
// inputHeight = nchan
// inputWidth = nsamps
// outputWidth = inputWidth/dsfact
dim3 threadsPerBlock(16, 16);
dim3 numBlocks(
(outputWidth + threadsPerBlock.x - 1) / threadsPerBlock.x,
(inputHeight + threadsPerBlock.y - 1) / threadsPerBlock.y
);
// Launch the kernel
downsample2D_mean<<<numBlocks, threadsPerBlock>>>(d_input, d_output, inputWidth, outputWidth, dsfact);
Sensible factors (given our "inputWidth" will likely be 10000 samples = 1 second) are: 1, 2, 4, 5, 8, 10, 16, 20, 25, 40, 50, 80, 100, 125, 200, 250, 400, 500, 625, 1000, 1250, 2000, 2500, 5000, and 10000
Being able to specify the desired output time and/or frequency resolution could be really helpful in terms of managing data rates, since not all VCS science needs 100us/10kHz.