The scaling_kernel was clearly designed to be a grid-stride kernel to multiply all elements by a constant; this correction makes it so.
While the examples currently build and run correctly as-is, if the parameters are changed (like in 1d_r2c_c2r) to be a larger value, it becomes highly likely that the output will be wrong, as only a few elements will be 'scaled' multiple times.
The scaling_kernel was clearly designed to be a grid-stride kernel to multiply all elements by a constant; this correction makes it so.
While the examples currently build and run correctly as-is, if the parameters are changed (like in 1d_r2c_c2r) to be a larger value, it becomes highly likely that the output will be wrong, as only a few elements will be 'scaled' multiple times.