Mismatch between texture sampling in GPU/CPU implementations

macksal commented 3 years ago

I've noticed that in the CUDA implementation of the pixel-wise grain algorithm, the sampling of the input image when iterating through cells is as follows:

// within the iteration of CELLS
float4 src_im_sxsy = tex2D(tex_src_im, (int)max(floor(xGaussian),0.0), (int)max(floor(yGaussian),0.0));

This gets the value of the image at (xGaussian, yGaussian), which is the coordinates of the output pixel, not the input pixel covering the relevant cell.

The CPU implementation seems to be correctly based on (cellCornerX, cellCornerY):

float u = imgIn[  (unsigned int)(fmin(fmax(floor(cellCornerY),0.0),(float)(mIn-1)))*nIn+
                (unsigned int)(fmin(fmax(floor(cellCornerX),0.0),(float)(nIn-1)))];

I've read the published article (which has been a fantastic resource) which seems to confirm that the sampling should be at the cell's pixel coordinates:

alasdairnewson commented 3 years ago

Hello, Sorry for the late reply. I unfortunately do not have too much time to dedicate to this. However, maybe a point which could help. There are two versions of the algorithm, the "grain-wise" and the "pixel-wise" version. Only the pixel-wise version is parallelised (and therefore in the GPU version), and this algorithm looks at the output pixels. Maybe this will help. Kind regards, Alasdair Newson

macksal commented 3 years ago

Thanks Alasdair, I am comparing the GPU implementation of the pixel-wise algorithm in this repo and the CPU implementation of the same in the "film_grain_rendering" repo. It appears to be a typo so while I don't mind if it is fixed or not, hopefully the open issue will alert anyone who is also working with the algorithm :)

dderiso commented 2 years ago

Fantastic library! This was a useful catch. All you need to do is change this line

https://github.com/alasdairnewson/film_grain_rendering_gpu/blob/5695ff2abea762c90740a0ae309336dc9be4cd95/src/film_grain_rendering.cu#L359

to

float4 src_im_sxsy = tex2D(tex_src_im, (int)max(floor(cellCornerX),0.0), (int)max(floor(cellCornerY),0.0));

That said, there's not much of a difference visually.

alasdairnewson / film_grain_rendering_gpu

Mismatch between texture sampling in GPU/CPU implementations #3