Open nehpetsde opened 4 years ago
I found that when replacing the line 90-91 in 'correlation_cuda_kernel.cu' file with the following code,
int32_t y1 = blockIdx.y * stride1 + max_displacement + kernel_rad;
int32_t x1 = blockIdx.z * stride1 + max_displacement + kernel_rad;
it performs well for kernel_size > 1.
The correlation result for any kernel_size > 1 is incorrect. A trivial proof is the correlation of two 3x3 images of all ones using a 3x3 kernel that should yield a scalar output of 1 (
9 / 9
).The erroneous result of
5 / 9
is from wrong index calculation for the kernel offset which produces negative offsets into unallocated memory. https://github.com/NVIDIA/flownet2-pytorch/blob/71034046166735a79a5b82df78de72d806e82842/networks/correlation_package/correlation_cuda_kernel.cu#L114-L127