Rapidly degrading adjoint test in 3D

grizzuti commented 2 years ago

Dear all,

I was testing cufinufft via the Python wrapper on some 3d problems, and I stumbled upon some accuracy issues when performing the adjoint test. That is, given the NFFT linear operator F,

dot(F*x,y) \approx dot(x,F'*y).

The adjoint test passes for very small-sized problems (32,32,32) (rel err around 1e-7 on my machine) but it gets quickly unacceptable for size (128,128,128) (rel err around 0.5, worst case). You can find below the code I was running.

Please let me know if this is somehow expected, or if it can be mitigated somehow.

Regards, Gabrio

# Module import
import numpy as np
import pycuda.autoinit
from pycuda.gpuarray import GPUArray, to_gpu
from cufinufft import cufinufft

# NFFT eval
def cufinufft3d_eval(u, kx, ky, kz, eps=1e-6):

     # Allocate memory on the GPU for output variable
     n_transf = 1
     U_gpu = GPUArray((n_transf, len(kx)), dtype=u.dtype)

     # Initialize the plan and set the points
     plan = cufinufft(2, u.shape, n_transf, eps=eps, dtype=kx.dtype)
     plan.set_pts(to_gpu(kx), to_gpu(ky), to_gpu(kz))

     # Execute the plan, reading from the array u and storing the result in U_gpu
     plan.execute(U_gpu, to_gpu(u))

     # Retreive the result from the GPU
     return U_gpu.get()[0,:]

def cufinufft3d_adjeval(U, kx, ky, kz, shape, eps=1e-6):

     # Allocate memory on the GPU for output variable
     n_transf = 1
     u_gpu = GPUArray((n_transf, shape[0], shape[1], shape[2]), dtype=U.dtype)

     # Initialize the plan and set the points
     plan = cufinufft(1, shape, n_transf, eps=eps, dtype=kx.dtype)
     plan.set_pts(to_gpu(kx), to_gpu(ky), to_gpu(kz))

     # Execute the plan, reading from the array U and storing the result in u_gpu
     plan.execute(to_gpu(U), u_gpu)

     # Retreive the result from the GPU
     return u_gpu.get()[0,:,:,:]

# 3D settings
# shape = (32,32,32)
# shape = (64,64,64)
shape = (128,128,128)
eps = 1e-7
u = np.random.randn(*shape).astype(np.complex64)+1j*np.random.randn(*shape).astype(np.complex64)
M = 10
kx = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)
ky = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)
kz = np.linspace(-np.pi,np.pi, num=M).astype(np.float32)

# Adjoint test
U  = cufinufft3d_eval(u, kx.flatten(), ky.flatten(), kz.flatten(), eps=eps)
V = np.random.randn(*U.shape).astype(np.complex64)+1j*np.random.randn(*U.shape).astype(np.complex64)
v = cufinufft3d_adjeval(V, kx.flatten(), ky.flatten(), kz.flatten(), shape, eps=eps)
a = np.vdot(U.flatten(),V.flatten())
b = np.vdot(u.flatten(),v.flatten())
np.linalg.norm(a-b)/np.linalg.norm(a) # rel err

janden commented 2 years ago

Thanks for bringing this up. Have you observed this with other NUFFT implementations, such as Finufft?

Will see if I can reproduce this.

janden commented 2 years ago

So I'm seeing errors of order one for all shapes when I run this locally. Will try on another machine. I can't see any real issue with your script though, so it's not clear to me what could be happening.

grizzuti commented 2 years ago

Hi Joakim, I did the same experiment with the cpu implementation of finufft; no issues there! Also, gpu 2D is fine. On my machine, inaccuracies are more pronounced for larger 3D experiments.

janden commented 2 years ago

So the problem seems to be hardware dependent. On my laptop (GeForce 940MX), all three sizes (32, 64, and 128) give an error of order one. Same when I run this on a P100. However, when I run on the V100/A100, I get an error of order 1e-7, 1e-6, and 1e-5, respectively.

My first guess was that this is memory related, but the P100 and V100 both have 16 GB of memory, so that's not it. Will dig deeper into this.

flatironinstitute / cufinufft

Rapidly degrading adjoint test in 3D #124