Open paquiteau opened 1 year ago
Hello @paquiteau , Sorry for the late answer. I tried your script and there are two things to change in order to make it work:
samples
was not converted to float32
, it was still in float64
, so you need to convert.Genred
syntax, so you need to convert the image
tensor to real type.
Here is the updated last part of your script:
samples = samples.astype(np.float32)
image = image.flatten().view("(2,)float32")
# Calls Forward
coeff_cpu = forward_op(samples, locs, image, backend="CPU").view(np.complex64)
coeff_gpu = forward_op(samples, locs, image, backend="GPU").view(np.complex64)
Now, there is another issue with accuracy. When testing, you will see that it runs, but the np.allclose
assertion will fail at the end. This is due to accuracy errors when using float32
. If you compute the absolute and relative errors as follows:
print(np.linalg.norm(coeff_gpu-coeff_cpu))
print(np.linalg.norm(coeff_gpu-coeff_cpu)/np.linalg.norm(coeff_cpu))
it will give something around 5e-2
for absolute error, and around 5e-6
for relative error. The "normal" loss of accuracy should be around 10 times smaller (I checked it by randomly permuting the input data). This bad accuracy is due to the use of use_fast_math
CUDA option, as pointed in this other issue. We will try to address this accuracy problem soon.
Hello there,
I am trying to implement the Non Uniform Fourier Transform on GPU using PyKeops.
Here is a minimal (not) working example:
Whenever I ran this with the GPU backend, I get the following traceback:
My GPU has 16Gb of VRAM and given the small size I am using I am not expecting to get an overflow.
I am doing something wrong with the Genred ? Could complex array be the culprit ?
Thank in advance!