fjarri / reikna

Pure Python GPGPU library
http://reikna.publicfields.net/
MIT License
164 stars 16 forks source link

FFT and double precision #2

Closed tnorth closed 11 years ago

tnorth commented 11 years ago

Hello Bogdan,

I am working with FFTs in double precision, and expected an error much less than 1e-6, as observed for single precision at pyfft docs. (Can I really have such expectations BTW ? looks like the CPU can make it...)

So I tried the (quick and dirty) attached program, which takes a gaussian as input signal. Using Tigger, I was expecting to see a big difference when using a complex128 rather than a complex64, and I don't.

Here is my output for the attached program. I have also noticed that adding the scale_const(N) transformation increases the error with a huge amount, that I didn't expected. (commenting out lines 29 and 33)

FFT error vs numpy implementation : {'double': 1.0374786210703966e-06, 'single': 5.1033833088560074e-06} FFT => IFFT error: {'double': 0.088621340270303342, 'single': 0.088621340270303342} # This is huge, and goes to 1e-8 without the scaling numpy FFT=>IFFT error (complex128): 4.35576763895e-17 numpy FFT=>IFFT error (complex64): 1.50743731241e-11 With PyFFT, single precision : 5.11018247625e-06 With PyFFT, single precision. FFT => IFFT : 2.13039431111e-08 (Also, pyFFT behaves in a weird fashion with complex128: I get NaNs in the output vector).

Is it possible that for some reason the FFT is always done in single-precision ?

Thanks!

import numpy as np
try:
    import tigger.cluda as cluda
    from tigger.core import Transformation
    from tigger.fft import FFT
    from tigger.elementwise import specialize_elementwise
    import tigger.transformations as transformations
except:
    print "No PyOpenCL/PyFFT detected"

api = cluda.ocl_api()
ctx = api.Context.create(fast_math=True)

N = 2**16
t = np.linspace(-1, 1, N)
a = np.exp(-(t / 0.1)**2).astype(np.complex128) # Some gaussian
af = np.fft.fft(a) # Reference FFT
af64 = np.fft.fft(a.astype(np.complex64))

dtypes = {'single':np.complex64, 'double':np.complex128}
a_gpu, af_gpu, afi_gpu, comp, compi = {}, {}, {}, {}, {}
err, erri = {}, {}
k = 0
for name, type_ in dtypes.iteritems():
    a_gpu[name] = ctx.to_device(a.astype(type_))
    af_gpu[name] = ctx.allocate(a.shape, dtype=type_)
    afi_gpu[name] = ctx.allocate(a.shape, dtype=type_)
    comp[name] = FFT(ctx)
    comp[name].connect(transformations.scale_const(type_(N)), 'output', ['output_scaled'])
    comp[name].prepare_for(af_gpu[name], a_gpu[name], 1)
    comp[name](af_gpu[name], a_gpu[name], 1)
    compi[name] = FFT(ctx)
    compi[name].connect(transformations.scale_const(type_(1 / N)), 'output', ['output_scaled'])
    compi[name].prepare_for(afi_gpu[name], af_gpu[name], -1)
    compi[name](afi_gpu[name], af_gpu[name], -1)
    err[name] = np.abs(np.sum(np.abs(af_gpu[name].get()) - np.abs(af)) / af.size)
    erri[name] = np.abs(np.sum(np.abs(afi_gpu[name].get()) - np.abs(a)) / a.size)

print "FFT error vs numpy implementation :", err
print "FFT => IFFT error: ", erri
print "numpy FFT=>IFFT error (complex128): ", np.abs(np.sum(np.abs(a) - np.abs(np.fft.ifft(af))) / af.size)
print "numpy FFT=>IFFT error (complex64): ", np.abs(np.sum(np.abs(a) - np.abs(np.fft.ifft(af64))) / af.size)

from pyfft.cl import Plan
import pyopencl as cl
import pyopencl.array as cl_array

ctx = cl.create_some_context(interactive=True)
queue = cl.CommandQueue(ctx)

plan = Plan(a.shape, queue=queue, fast_math=False)
a_gpu_ = cl_array.to_device(ctx, queue, a.astype(np.complex64))
plan.execute(a_gpu_.data)

error = np.abs(np.sum(np.abs(a_gpu_.get()) - np.abs(af)) / af.size)
print "With PyFFT, single precision : ", error
plan.execute(a_gpu_.data, inverse=True)

error = np.abs(np.sum(np.abs(a_gpu_.get()) - np.abs(a)) / af.size)
print "With PyFFT, single precision. FFT => IFFT : ", error
fjarri commented 11 years ago

There are two small problems I have identified at the moment. First, since N is integer, 1 / N in the scaling transformation is zero, which leads to the significant difference in results. Second, there are some appearances of float instead of double in the generated code because the scaling transformation ignored specific complex128 type and applied min_scalar_type() anyway (fixed by recent commit).

This still does not clarify the problem of the same error in single and double precision though, which I will investigate further.

tnorth commented 11 years ago

Oops, indeed, I am used to the from future import division stuff and I missed this integer division. Thanks, waiting for your updates on that.

tnorth commented 11 years ago

Thank you very much !